By J. H. Saltzer, February 4, 1997, updated 1997, 1999, 2000, 2002, 2003, 2004
Instructors: It is worth checking out a recent comment on this paper. The comment points out that Gabriel's paper has a tendency to be misused by people who haven't really understood what it is saying. We may return to this topic at the end of the term by asking the class to read the comment then.
First paragraph on second page: How many people in the class think they could explain the PC-losering problem?
Now, the real question: What should you do when you run across something that is completely unknown?
(Bookmark it. Build a mental model of what you conjecture the thing to be. Decide whether or not it matters to the understanding of the paper--often it doesn't. (Did it matter what CLOS meant?) If it is beginning to look crucial, and not enough of the mental model is coming together, try a Google search to see if you can find a definition or description. Otherwise, keep slogging and ask a question when you come to class. It turns out you can figure out what this paper is about without knowing what PC-losering is.)
x + 1
(If your favorite language is Scheme, it always calculates a value that is an integer one larger than x. By almost any definition, this is the right thing.
In almost any other language you get either something like (x+1)mod 2^32 or the possibility of an overflow exception. The simple implementation produces a complex interface, almost certainly not what the programmer really wanted. "It is more important for the implementation to be simple than the interface." And it is easy to write code that will occasionally fail. But languages that implement big numbers are scarce. Why? And what about X/3? The right thing (rational number arithmetic) is pretty fierce to implement.)
What does the "worse is better" school say?
("It is slightly better to be simple than correct." Just calculate the year mod 4, and if the result is zero we have a leap year. This calculation gives the right answer from the dawn of computing through 2099 and after four years of field experience the implementation is unlikely to have any bugs in it.)
What does "the right thing" school say?
("Incorrectness is simply not allowed." We do a bunch of extra arithmetic to check for divisible by 100 and divisible by 400, with various conditionals to decide which case we are presented with. Some of this code won't be exercised for nearly 400 years, at which time some, but not necessarily all, of the remaining coding bugs will emerge. So "worse is better" seems to have an identifiable advantage, at least for a century or so.)
Is that the end of the discussion?
<(David Karger points out, that rarity isn't really the right measure; rather, one should think about "expected utility"---the odds something will go wrong times cost if something does go wrong. In designing deep space probes one should do the right thing, even for inconceivably rare events, because the cost of incorrectness is so high. Thus, they invest huge amounts of time designing/implementing/testing in the first place.
Conversely, in a desktop computer for home use, the cost of a failure on the desktop is negligible (the vendor has no liability, and anyway users rarely lose enough to get horrifically upset) while the investment to do it right could be huge. So designers for this application take the "good enough" approach)
(In Multics--a classic Right Thing design from 1965--,file names could be 32 characters long. Because RAM and disk space were quite expensive, and I/O channels were slow, it was unthinkable to allocate that much space for names that were actually shorter, so dynamic allocation of space within directory entries was required. The directory manager was an alarmingly slow program, for the era.
In 1981, MS-DOS chose to use 8.3 file names, for the sake of a quick and simple implementation, using fixed space allocation. From a user's point of view, being limited to short file names, while not the end of the world, is a significant nuisance; short names lead to various forms of inappropriate labeling of objects. So we have here a trade in which implementation simplicity is gained at the cost of a clumsier user interface.
The 1984 Macintosh Operating System came up with a shrewd compromise: 32-character file names, but implemented with fixed allocation. Disks and RAM had gotten cheap enough that one can have one's cake and eat it, too. Right Thing and Worse is Better converged to produce Right Thing is Better in this case.
([suggested by Ron Rivest] Gabriel says "A further benefit of the worse-is-better philosophy is that the programmer is conditioned to sacrifice some safety, convenience, and hassle to get good performance and resource use."
An example is the frequent use in C programs of gets(char *s)
rather
than fgets(char *s, int n, FILE *stream
). The gets
interface stores into
its argument whatever string is available for input. The fgets
interface limits the
length of the string stored to the size specified by its second argument. As a result, gets
is
frequently implicated in buffer overflows, which in turn are frequently implicated in security
holes. Using gets
appears to be a classic example of "worse-is-better" thinking (it is "slightly better to be simple than to be correct").
Quoting Rivest: 'Perhaps one can generalize: for security, "the right thing" is the only thing...')
([from Simson Garfinkel]. Consistency means things like using the same standard for "what is a string" everywhere in the API.
The WIN32 API uses at least 4 different standards:
func(char *) - char * is a null-terminate string func(bytes[],int len) - bytes[] is an array, len is its length func(bytes[],int len-1) - bytes[] is an array, len is 1 less than its length func(words[], int len) - words[] is 16-bit Unicode; len is # of bytes * 2
It means consistent spelling. Unix sometimes abbreviates, like creat()
,
and other times does not, like readdir()
.
It means consistent inter-word conventions, and not using both
something_else()
and somethingElse()
and even something_Else()
Consistency of API interface is a problem when you are working on a large software project with many different contributors. It takes extra time to go back and make everything consistent. The advantage is that it is easier for developers to write code for these APIs. Microsoft's code annotation effort has tried to document all of the different string standards so that the compiler can check for buffer overruns.
Perl does not have a consistent interface. There are typically five or
six ways to do anything you want. This is actually put forth as a
feature of the language. For example, you can call a perl function
like this()
or like &this()
or, in some cases, like &this
.
Some other examples:
Here are two further thoughts from Simson about consistency...
It is helpful to distinguish three levels of possible questioning of Gabriel's point:
1. The concept of a "worse is better" design philosophy versus a "right thing" philosophy. Those represent poles on a spectrum, but it is a useful characterization.
2. The application of that concept to characterize UNIX and LISP/CLOS. The application is at best a caricature, since any real system is likely to exhibit examples from both ends of the spectrum and lots of points in between. This leads to questions such as which philosophy dominated a given system design. Working backwards, imagine how Gabriel came up with his observation: presumably by thinking long and hard about how to characterize what he saw as big differences between the UNIX and the LISP system designs.
3. The application of that concept to a particular example. This is especially tricky, because the motivations for any interesting design decision are multiple, and there are usually silver linings in the darkest of clouds. And in the case of PC-losering, the UNIX design is arguably just as much the right thing.
) Saltzer@mit.edu