To: Software Development Team From: Daniel Ring, Manager Date: March 2, 2006 Subject: Protocol for Preventing Concurrency Bugs
Concurrency bugs, also known as race conditions, are errors which result when two or more threads access a shared variable simultaneously. A recent report has highlighted the difficulty and importance of detecting and fixing concurrency bugs and has identified several deficiencies in a competitor's product (Yu, et al. 2005). Our product must be free of these weaknesses if it is to be successful.
To prevent concurrency bugs from destabilizing our product, developers should adhere to the following protocol.
All software developers will adhere strictly to the locking discipline. This requires that a shared object is accessed by a thread only while that thread has control of the object. Control is received using lock() or an environment-specific equivalent. lock() must guarantee that no other thread may access the object until control is explicitly released by unlock(). The thread may access the object only after lock() and before unlock() (Savage, et al. 1997).
There are 2 exceptions to this rule (Savage, et al. 1997):
Initialization: An object may be initialized and manipulated by its originating thread without the use of lock() after it is allocated but only before the object is available to additional threads. Read-only variables: An object which is initialized but never modified may be accessed without lock().
The locking discipline is unambiguous and conformance is cheap in terms of programming hours and effort, whereas failure to conform may result in a lengthy and expensive test/debug cycle. A delayed or flawed application release would be detrimental to consumer satisfaction, market share, and job security.
All shared variables will be documented. The documentation will include:
Origin: where and when the object is allocated and initialized Availability: when and to which threads the object becomes available Exceptions: whether and why the object is subject to either of the above exceptions to the locking discipline.
The priority is to incorporate thread-safe design into the initial implementation. However, because programming errors are unavoidable, two additional tests will be added to the testing phase of development. The first will employ static race detection at compile-time to verify that the locking discipline has been applied to all appropriate objects. The documentation described above will be used to remove false positives due to valid exceptions. The second test will employ RaceTrack (http:// research.microsoft.com/research/sv/racetrack/) or an environment-appropriate alternative (if available) to further verify reliability (Yu, et al. 2005).
Yu, Y., Rodeheffer, T., Chen, W. (October 23-26, 2005). RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking. In 20th ACM Symposium on Operating Systems Principles (SOSP) (pp. 221-234). Brighton, UK.
Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T. (November 1997). Eraser: A Dynamic Data Race Detector for Multithreaded Programs. ACM Transactions on Computer Systems, 15(4), 391-411.