6.033: Computer System 
Engineering

6.033: Computer System Engineering - Spring 2001

------------

Expectation of code reuse in developing new software

Winnie Chan

The authors of the Therac-25 control software adapted faulty routines that led to tragic accidents; however, this outcome of the decision to borrow code is not representative of what may be expected of code reuse in general. In the case of Therac-25, poor software engineering practices, such as the minimal software documentation and unit testing, were the leading causes contributing to the Therac-25 accidents. The software developers used borrowed code from Therac-20 to handle the dual mode of X-rays and electrons, but they wrote their own stand-alone operating system to regulate safety. This paper will provide a brief analysis of how the software developers' poor engineering practices, both with reused and newly engineered code, directly impaired the safety of Therac-25.

One example of the software developers? poor engineering practices was their naive assumption that using off-the-shelfww software would increase safety because that software would have been thoroughly tested. For example, Therac-25 used modified software from Therac-20 to handle the dual mode of X-rays and electrons. Because of inadequate unit testing, flaws in the borrowed code were propagated in the Therac-25's software. One such notable bug from Therac-20 was contained a data-entry routine called Datent, which allowed the code to proceed to Set-Up Test before the full prescription had been entered and acted upon. Due to flawed control logic, if the keyboard handler set the data-entry completion variable before the operator had changed the data in MEOS (mode/energy offset), Datent would not detect the changes in MEOS. The software failed to include checks to detect the inconsistency between the lower order byte of MEOS and the parameters set by the higher order bytes. Rigorous testing almost certainly would have exposed the bugs in the data-entry routines, and prevented the editing from being ignored and the Tyler radiation overdose from occurring.

However, without proper software engineering practices, writing new code, instead of reusing potentially flawed code, would not have guaranteed safety either. For example, the authors of Therac-25 developed a stand-alone treatment operating system that was solely responsible for maintaining safety. The computer monitored the interlock system that removed power from the unit when a hardware malfunction was diagnosed, causing a pause or suspension of operation. During the Set-Up Test routine, a shared variable Class3 having a value of zero was used to indicate that all the relevant parameters were consistent with the treatment before the beam was released. Unfortunately, on every 256th pass through the Set-Up Test code, the Class3 variable would overflow to a zero value, which caused the upper collimator faults to be undetected because the test was skipped. This bug that allowed the device to be activated in an error setting, the result of unsafe programming practices, was the direct cause of the Yakima and Hamilton accidents.

As demonstrated above, this result of code reuse is not representative of what is to be expected from code reuse in general. The tragic accidents related to Therac-25 should be attributed to the lack of testing, as it was clear that the new code with the Set-Up Test routine contained life-endangering bugs as well. AECL was overly confident about the software being robust despite the lack of documentation on specifications and test plan. They argued that "Program software does not degrade due to wear, fatigue or reproduction process" and hence did not include software in the fault tree analysis. Unfortunately it was the naively uncritical assumptions regarding the software's performance that led to the disastrous accidents of Therac-25, and not the code reuse in particular. In general one should have the same expectations for written code as one has for reused code: both are bound to be error-prone and require rigorous testing.