by J. H. Saltzer, last updated February 1, 2003.
(It is a pun on "Eliminate RACEs")
(The first answer is usually lock(x). But that answer illustrates a misunderstanding caused by a misleading name. The ordinary English language meaning of "lock" is that is is a barrier that prevents access, and the obvious analogy to "lock(house)" is wrong.)
(By lock(Y), where Y is a flag associated with X. The meaning of "lock" is "I just acquired a flag that is intended to protect object X--if all other programs that touch X are correct. The interesting observation is that it is not a barrier, it doesn't by itself prevent access like a door lock, and most important of all, we can't verify correctness by local analysis. We must inspect every other program that touches X.)
(It contradicts modularity. You have to look at all the other modules to figure out whether or not this module is programmed correctly. We will also see later that locks make it hard to compose large modules out of smaller ones; you need to know what locks the smaller ones are using. Locks and modularity are at odds with one another.)
The inability to be sure you have done the right thing by local inspection is one of the reasons we need Eraser.
(The implementation of chapter 3 certainly doesn't enforce any such restriction, though it would be easy to add it. Some locking systems enforce those semantics, others don't.)
(It does need to know which thread is setting a lock, in order to run more advanced versions of the lock-set algorithm. But we haven't gotten that far yet, so let's bookmark that question.)
(The programmer of a different thread that wants to use V needs to know which lock to set. If the answer depends on where some other thread happens to be executing, this leaves the first programmer without any way to decide.)
(Not if there could be two threads running the original code of figure 3.)
(Introduce a third lock at the beginning and end of the code for the first thread. The new lock assures that the first piece of code can be executed by only one thread at a time. Having the second thread set both locks then eliminates the race. But Eraser continues to raise an alert. We just found another category of false alarm that the authors didn't mention.)
(Before you run a program, modify the binary--instrument it so that Eraser can watch every data reference, then let it run and see if it violates its own locking protocol. The lockset algorithm is a protocol checker.)
(No. Any locking errors in parts of the code that the program doesn't happen to execute won't be tested. This is risky.)
("We have been running this program for four years and it never failed before. Why is it suddenly crashing every five minutes?" Eraser can help find bugs, but it does not provide assurance. If a naive programmer thinks it does, he may have false sense of security. Analogy: running a spell-checker may mean that all the words in your paper are spelled right, but it doesn't mean that there are no typos; you may have typed "along" where you meant "among".)
(Because not every variable is both shared and modified, and it is only shared-modified variables that can be the source of races. So the state diagram shows a way discovering which variables are actually shared-modified.)
(Oops, a bug in the description. The figure is right. Looking at the later description of the implementation, any write will take it to shared-modified. Once it is shared it is running the lockset algorithm without giving warnings, which means that the per-variable shadow area contains the lockset pointer, so it can no longer be keeping track of the thread number of the original writer. We can also reason from what it should do. If anyone is writing into a variable that at least one other thread has been reading from, we have a possibility of a race, so we had better we raising alerts if the locking protocol is violated. [a legalistic reading of the text can claim that it is technically accurate; it is true that a write access from a new thread in the Shared state does take it to the Shared-Modified state; they just didn't bother to mention that a write access from the old thread in the Shared state also takes the variable to the Shared-Modified state. Under that interpretation the sin is that the authors forgot to mention one important case.])
(
thread 1 thread 2 x = 10 (initialization) acquire(xlock); if x > 5 x = 2 y = x*3; else y = 0; release(xlock);)
(Yes. Although in the paragraph on page 398 it appears that perhaps there is some subtle difference, from the previous page it is pretty clear that the authors are using the words as synonyms. This is an interesting example of a difference between technical writing and literary writing. In English class you learned to avoid repeating the same word; use synonyms to make the sentence more interesting. In technical prose it is more important to be precise, so using the same word has a benefit. When you use a different word, the reader assumes that--or at least wonders if--you have some subtle distinction in mind. In a technical context, the literary approach can introduce unnecessary ambiguity and slow down the reader.)
(By adding calls in the program that say "Eraser, ignore the following code, I know it is OK". The risk is that the following code has a subtle race condition that eraser would have noticed but that the programmer missed.)
(The state diagram of figure 4 has arcs labeled "first thread" and "new thread", so it certainly needs to know who is setting a lock. But presumably Erasericould find that out by looking in some currentÐthread system variable. And it is certainly true that a locking protocol in which one thread acquires a lock and another thread releases it is going to be hard to debug. But it doesn't seem that Eraser would give different answers if the discipline of only the owner can release a lock is abandoned.)
if(p->ip_fp == (N12_XFILE *) 0 ) {...This is pretty baffling. What does the syntax mean?
(Very few students are likely to be this familiar with the C language. To the left of the == is a pointer-based reference to a variable. To the right is a zero preceded by a parenthetical expression, called a "cast", that coerces the zero to make it into the same type as the thing on the left. Since we haven't seen the declaration for the thing on the left or the definition of N12_XFILE, this is not especially obvious. The cast is to "pointer to an object of type N12_XFILE")
if (A < 5) acquire (lock_a) if(A < 5) A = A + 1 release (lock_a)
(To optimize the common case. Suppose that this code is in the middle of loop that is executed a million times, and A will be less than five only once or twice. If so, in almost every execution the first statement will quickly rule out the possibility that A is less than five. If it finds that A appears to be less than five, then it is worth going to the trouble of acquiring the lock. Of course, A may have changed by now, so it needs to be tested again, under the protection of the lock. If it is still less than five, we increment it. If not we just release the lock and go about our business as if the test had failed when we first looked. Since the outcome of the first test is used only as a hint that is later reverified, it doesn't matter that there is a race.)
(Eraser's dynamic tracking depends on statically inserting into the program a set of instrumentation probes--actually subroutine calls. Code generated by the program on the fly won't have these probes installed, so it may set locks or touch shared variables that Eraser won't find out about.)
Saltzer@mit.edu