Effy Oz. When Professional Standards are lax: the CONFIRM failure and its lessons. Communications of the ACM 37, 10 (October, 1994) pages 30-36.
by J. H. Saltzer, April 24, 1996, from 1995 notes.
One fruitful approach to discussing this paper is to look in it for examples of the problems identified in our discussions of complexity and in Brooks' book. 1. The Therac-25 paper and this paper are the only ones we have seen this term that provide any level of detail about disasters. There are practically no others. Why don't we see more papers that explain disasters? (Companies don't like to discuss losing ideas; the want to project a corporate image that the company is associated with winning ideas. There may be lawsuits hanging on who gets blamed. There are usually multiple causes and it is hard work to tease them apart to see what really happened. The individuals involved may be embarrassed to admit that they made key, incorrect decisions, or they may feel that their future career would be jeopardized. Everyone wants to get on with the next project, not do post-mortems on the last one. Many such projects trail on for a surprisingly long time before they are finally put out of their misery; during this time the original principals have all moved on to other projects and are hard to locate.) 2. Can we find examples of things warned about by Brooks? (The second-system effect is the main one. The new system is supposed to be much better than the existing systems.) 3. What evidence is there of the second system effect? (Buzzwords in the discussion of its objectives: "state of the art", "functionally richer," "costs will be less, "superior to any current reservation system," "completed in time to outpace the competition.") 4. Is there any evidence of the Mythical Man-Month? (The preliminary design team spent $1.5M in only 5 months, a rate of $300K/month. If the staff earned $100K/year there must have been at least 35 people working on that preliminary design. The main project was budgeted at $56M over 52 months, a rate of $1.1M/month, enough to pay for about 150 professionals.) 5. Is there any evidence of the bad-news diode? (Everywhere. Spring 1990: "employees estimated CONFIRM would not be ready in time; they were instructed to change their revised dates so that they reflect the original project calendar." Spring 1991: employees were told to change timetables to meet the schedule or be fired. Management was eventually fired for not revealing the true status of the project. Summer 1991: Consultant hired; his review was too negative, his report was buried and he was fired.) 6. Another interesting approach is to plot a time-line: Sep., '88 project launch, completion target June, 1992, cost $55M Feb., '90 1 quarter behind schedule Oct., '90 1 year behind schedule after 2 years of work (fell 9 months farther behind in 8 months) Feb., '91 Replan: new cost $92M June '92 will have only some features Full features in March, '93 Apr. 30, '92 15-18 months from completion = Oct 30, '93, 7 months behind schedule after 1 year of work on Replan. Second estimate: 2 years behind schedule after 1 year of work. Jul '92 Project cancelled, $125M spent, plus opportunity losses of $160M. 8. The paper suggests that the heart of the problem lies in ethical considerations. Do you think that if all the management people had acted ethically according to Oz's prescription, the system would have been successfully delivered on time? (Almost certainly not. The article doesn't say that they chose to develop the system developed using an IBM system extension called "Transaction Processing Facility" (TPF). Unfortunately, if you use TPF your application must be written in machine language. The admonition to "use sharp tools" has been ignored. Chances are this project was doomed from the outset; the management attempts to hide the problems probably only had the effect of delaying recognition of the disaster.) 9. Are these system objectives beyond what the state of the art can accomplish? (No. Hyatt put together a system based on Unix, the Informix database system, and the Novell Tuxedo transaction processing monitor. The prototype was working in three months, and the entire project, including hardware, cost $15M and it is apparently working just fine, handling 1,000 booking agents. It is claimed to be delivered ahead of schedule, under budget, with extra features.)