MEMORANDUM

TO:	Software Development Team
FROM:	Katherine Hollenbach, Manager
SUBJECT:	Improving Reliability of Multithreaded Software
DATE:	March 2, 2006

After learning about race-condition bugs at Microsoft that slipped through years of code reviews, I have decided that we must revise our methods for testing multithreaded software. Race-condition bugs are very difficult to find because they are not deterministic. To foster the discovery and elimination of race-condition bugs in our products, I have asked our Software Testing Engineers to design and implement an application to run lockset analysis on our software.

Lockset analysis makes it possible to find variables that are shared by multiple threads, but not protected by a lock. It links variables to a set of locks held by threads that access the variable, and issues a warning if a variable has an empty lockset. Those of you who are familiar with the lockset algorithm may object that it issues too many warnings when there is no danger of a data race. It will take time (and manual labor) to identify the real problems, but we will no longer have to spend time trying to reproduce concurrency bugs just so we can identify and fix them.

The Software Testing Engineers also considered alternatives to lockset, like happens-before algorithms, which can miss potential data races. Lockset analysis will give us the best opportunity to find and eliminate race conditions in our software, even if we have to carefully analyze which results are false alarms and which are real threats.

Programs that appear to be thread-safe can still have race condition bugs. Therefore, all of our multi-threaded software will be subject to lockset testing before it is released. The lockset testing should also be used to help identify race condition bugs as they appear in development. Identifying and correcting race conditions early will allow us to produce software that is more reliable.