6.033 Discussion Suggestions (Simon Complexity paper)

6.033--Computer System Engineering

Suggestions for classroom discussion of:

Herbert A. Simon. The Architecture of Complexity. Proceedings American Philosophical Society 106, 6 (December, 1962) pages 467-482. Also available as Part 4, pages 84-118, of The Sciences of the Artificial. M. I. T. Press, 1969. ISBN: 0-262-191051-6 (hardbound) 0-262-69023-3 (paperback).

by J. H. Saltzer, February 7, 1996, updated February 6, 1998, February 7 2002, and February 1 2003.

What is this paper on biology, sociology, etc., doing in 6.033? Where was it published? When? Is it still pertinent, or has the world (and technology of computing) changed so much since 1962 that it no longer applies?
Who is this guy Simon? Does he have any credentials to be making such sweeping generalizations? (1975 Turing award, 1978 Nobel Prize in economics, helped develop dynamic programming, co-creator of the General Problem Solver, one of the founders of AI. Well-known generalist. Helped form the business school, the school of computer science, and the psychology department at CMU.)
Are his citations broad or narrow, many or few, just for support or indicative of other things worth reading? (Note that citations are interspersed in the footnotes, rather than collected at the end. They cover an incredibly wide range of kinds of sources, from chemistry textbooks to Plato.)
What is the main thesis? Is it that you should design systems hierarchically? (No. It is that complex systems evolve, whether naturally or under human guidance, in the direction of hierarchy. Or, put more bluntly, as systems grow more complex, only the hierarchical ones survive. That is subtly different from saying that to build a complex system you should start by making it hierarchical.)
After reading this paper, can you think of any examples of large, complex systems that are NOT organized hierarchically?
- Thermodynamic systems (The weather is a good example. Meteorologists model it hierarchically, but only because that makes the model tractable, not because it is a good model.)
- The World-Wide Web (but compare with Yahoo, which imposes a hierarchy)
- A chart of who knows whom, for everyone on Earth (though Simon on page 462, column 1, uses this as an example of hierarchy; he manages to see hierarchy everywhere.)
- Napster, Kazaa, and their peer-to-peer relatives.
note that crystal structures have many components and are at best a flat hierarchy, but it isn't clear that they should be considered complex.
So why shouldn't you build a complex system hierarchically? What are the arguments AGAINST hierarchy? Why doesn't everyone do it for all systems?
- no arguments against appear in this paper
- It may take more components
- It may constrain out a needed function
- It can be very hard to find the correct hierarchy among thousands of competing, equally plausible, but wrong organizations
- Hierarchical designs may be hard to adapt to different circumstances.
Is "Ontogeny recapitulates Phylogeny" relevant to computer systems? Does Windows 95 show any evidence of this idea?
(First, recognize that in the years since this paper was written, embryologists have decided that the concept is bogus. But as for Windows it seems to be real. It is reported--but not confirmed--that the Windows 95 boot sequence goes as follows:
16-bit addressing BIOS ==> 32-bit addressing BIOS ==> DOS ==> Windows
and that it is difficult to boot Windows on a machine that does not support 16-bit addressing and some other arcane interrupt features from the 8086 days. Neither of these features are used in Windows, but are needed simply during the ontogeny of Windows. [example thanks to Larry Rudolph])
Simon wrote this paper in 1961, at which time technology wasn't changing nearly so fast. So he hadn't yet seen the effects of 10X cheaper, 10X bigger, and 10X faster every three years continuously for three decades. His analysis considered only static, or quasi-static systems. Is anything different when technology changes rapidly?
(Simon suggests that the only systems that survive evolution are the hierarchical ones. But perhaps we are now seeing systems that haven't (yet) had a chance to evolve; they sprang into existence from the heads of the people who conceived them. So the evolutionary forces that Simon describes as leading to hierarchy aren't at work--or more likely haven't yet had a chance to work.
Some examples. At the outset,
- MS-DOS showed negligible hierarchy (Windows has added some)
- the World-Wide Web had no hierarchy (Yahoo is imposing some)
- file systems were flat (Multics added the naming hierarchy)
- network names were flat (DNS added a hierarchy)
[line of thinking suggested by Larry Rudolph])
How many examples of hierarchy can you find in your favorite computer system?
- File system naming hierarchy
- Control of access control lists
- Nested atomic transactions
- Layered Interpreter implementations
- Network protocol towers
- Network naming and routing schemes
- gate/chip/board/module/workstation
- bit/word/page/memory-card
- word/object/catalog/file-system
The last two examples suggest that the physical hierarchy and the logical hierarchy may be different for the same system. For example, one has a physical hierarchy of
- book/volume/page/line/character
and a logical hierarchy of
- book/chapter/section/paragraph/word.
Where do these two hierarchies conform?
- successive paragraphs are usually on the same page
- each chapter starts on a new right page and is probably not split between volumes
- each line starts with a new word
What are the pressures that make them conform?
- reliability. If each page contained one paragraph from each chapter, then tearing out one page might make the whole book useless.
- speed of access. Same example.
- complexity of access. Same example.
Dual-direction hierarchies: interpreter is an example. As you go down through a tower of interpreters, each instruction is expanded into many steps at the next lower level. As you go up, each interpreter is able to handle many different applications from the next level above. You can draw two different trees, depending on which insight you are trying to expose.
Back to the network name example. The reason that systems evolve toward hierarchy is that assembly of stable subsystems is less hazard-prone (the watchmaker parable illustrates). Biological systems are always being assembled, so this effect is very important there. But for the computer named ginger.lcs.mit.edu the naming system was assembled once (twice, actually). There is no need for stable subsystems. So why is it hierarchical? (It is taking advantage of a completely different feature of hierarchy: delegation. The goal is that every one of the 450 million computers on the Internet should have a different, unique name. If we tried to do that by having someone at the Hague assigning names, the system couldn't possibly work. Instead, someone at the Hague devised the named "edu" and delegated the resposibility for naming things within that domain to someone else. That person named "mit" and delegated there responsibility for the next level to the MIT network group. That group named "lcs" and delegated responsibility to someone at Tech Square. That guy can choose names like "ginger" and be confident that it is unique.
The interesting thing here is that hierarchy has at least two distinct uses: stability and delegation.)
If you know anything about probability analysis, can you follow Simon's calculation that suggests that Hora is 4000 times as productive as Tempus? (You shouldn't be able to--Nancy Lynch points out that the calculation is bogus. It is a credit to Simon's intuition that the answer is off by only a factor of two.)
How could such an error have made it into a respected, refereed journal? (This is a good opportunity to bring up the procedure by which papers get into journals: author submits, editor sends it out to referees who read and offer opinions to accept, revise, or reject. But this is a journal of the American Philosophical Society, not the American Math Society. The members who were asked to review it probably didn't know much about probability, and trusted Simon to do that part right. It is more likely that they had concerns with Simon's casual treatment of Alexander's empire vis-a-vis Lawrence of Arabia.)
So what is the right method and answer?
(Here is my rendition of Tom Leighton's analysis of Simon's parable of the watchmakers, with one minor summation error corrected:
Assume a watchmaker has managed to assemble (k-1) components and consider for a moment how many tries, T, it will take to add just the k^th component to the assembly. If the probability of being interrupted is p, the chance of success on the first try is (1-p), on the second try is p(1-p), on the third try is p²(1-p), etc. The expected number of tries is thus
```
      T = (1-p) + 2p(1-p) + 3p²(1-p) + ... = 1/(1-p)
```
Of course, on each such try the watchmaker will first have to assemble all (k-1) previous components. So how many steps will it take to get to the point that k components are assembled? Let S_k be the expected number of steps required to assemble k components. Wald's theorem says that if each assembly step is independent of the next (that is, T is independent of k) we can simply add the expected values, which allows us to make a recursive statement about S_k:
```
      S_k = T + T*S_k-1 
```
That is, the total number of steps will be the expected number of tries required to add that last component plus the expected number of steps to assemble the (k-1) previous components, which we will, unfortunately, have to repeat T times. The recursion terminates by noting that
```
      S₀ = 0 
```
That is, an assembly of zero components requires zero steps.
This recurrence can be written
```
      S_k = T + T² + T³ + ... + T^k
```
which series sums, for finite, positive k and any value of T, to
```
                T^k - 1
      S_k =  T * -------
                T - 1
```
Going back to Simon's paper, when p = .01, T = (1/.99). Using k = 1000 for Tempus and k = 10 for Hora (and multiplying Hora's result by 111 subassemblies) we get
```
     Tempus:               2316257 steps 
```
```
     Hora:  10.577 * 111 =    1174 steps 
```
to assemble one watch apiece, and we conclude that Hora can produce 1973 times as many watches per unit time as can Tempus. [Thanks to Chandra Boyapati for helping to find the summation error.])
What can we conclude from all this?
(Simon's intuition was good: he knew what ballpark the answer was in, and he correctly chose the parameters of the problem (size of assembly, subassemblies, and probability of interruption) so that they would yield a dramatic difference.
But having gotten that far, he then tried to demonstrate the answer with scientific precision. Actually, he did much of that calculation correctly. He described three ratios, he correctly calculated each one, and he even left enough tracks behind that we could follow him and check his calculations. So far, so good. But for the final step that went wrong, he left no tracks whatever. Without explaining why, he multiplied the three ratios together and incorrectly offered this product as the answer to the original question.
Note that Simon's second ratio (Tempus loses 20 times as much work as Hora per interruption) seems unjustified, but it is actually OK. Assuming that Hora's assembly process is independent of the telephone ringing, when the phone rings the current assembly would on average be about half done and five steps would be lost. This is a good approximation for Hora's case, and it may be that Simon tried to explain it but the journal editors blew it. The (correct) phrase regarding Hora that "each interruption will cost only about the time to assemble five parts" has a footnote, but the footnote that appears at the bottom of the column has nothing whatever to do with the subject at hand. This is probably an editing or typesetting/proofreading goof. But because this ratio is near the heart of the calculation mistake, the editing error compounds the situation and has led several people to mistakenly believe that this ratio was calculated incorrectly. It isn't that the ratio was calculated wrong, it is the wrong ratio to calculate. Simon should have calculated for the second ratio the expected number of steps lost per assembly, rather than steps lost per interruption.
There are several lessons here:
- It is risky to tackle a probability problem using only intuition, no matter how good. There are too many corners where intuition fails, and a systematic accounting of the sample space is always the better way to proceed.
- You can't believe everything written by a Nobel-prize winner (even before he wins the prize)
- Even respected journals make editing mistakes and have referees that miss some errors.
(This one makes more sense when this paper is read at the end of the term) What connections can we find with Brooks' Mythical Man Month?
- Chief programmer team is hierarchical, sort of
- "Plan to throw one away" relates to evolution and discovery of right hierarchy
- The second system effect also relates to evolution
- The tarpit might relate to lack of hierarchy

Comments and suggestions: Saltzer@mit.edu