About the Author

Drew Dennison was born and raised in the western suburbs of Chicago, Illinois. He is a member of the Class of 2013 and is majoring in Electrical Engineering and Computer Science. Drew enjoys being involved in many activities around campus and is part of the MIT Sport Pistol Team.

"Crash" was an investigative essay that explored why computers have not reached the level of maturity where computer crashes are a thing of the past. This essay was written for a non-technical audience and was based on his experience in solving computer crashes for people who "just wanted the thing to work so they could work."

Crash

by Drew Dennison

Last summer, I installed a new printer on the family computer and everything seemed to be working fine. Then a few weeks later, the computer ran a system update and would not start without an error. Every time I rebooted the computer, about one-quarter of the way through, the monitor would turn an all-too-familiar shade of royal blue with white letters.

I felt frustrated because I knew that I had been the only person using the computer that day and was confident user error was not to blame this time. Furthermore, the computer had started beautifully that morning and then a status message from HP Update had reported that I had one update to apply. I clicked "OK" and watched as the update utility downloaded and installed the update. Throughout the process, no alerts indicated that anything had gone wrong or that this update wasn't just an ordinary security patch.

After the update had installed, a message said that the computer must be restarted - another annoyance. I clicked "OK" again because it was the only option. As I sat there waiting for the computer to restart, I noticed that it had hung during the boot sequence and then automatically restarted again to the error message. The computer seemed to be stuck in this state and no number of hard resets helped. I tried using the F8 option to boot into safe mode to see if I could disable the problematic driver. Unfortunately, the computer was still blue-screened on startup even in "safe mode."

Quickly I ran down a list of possible causes and checked for any obvious hardware failures, but nothing seemed amiss. I downloaded a bootable virus scanner from McAfee to check for a virus, but found nothing suspicious. Also, the computer hardware seemed to be working normally, but just to make sure, I used a CD to boot the computer into Ubuntu 9.04. Everything acted normally when the computer was running Ubuntu instead of Windows. I realized that I could spend all Saturday trying to diagnose the crash, but I had no guarantee of actually solving the problem. It seemed easier and much faster just to restore the computer from a hard drive image I had made earlier that week.

What causes computer crashes? Are they simply an inherent part of the incredibly complex machines that computers are, or can we build a computer that "just works" like a toaster or microwave?

This experience was a relatively minor one in my years of dealing with computer crashes and other computer malfunctions, but it caused me to think about why this even happened. Shouldn't there be a way to design an operating system so that one botched driver update couldn't bring down the entire system? What causes computer crashes? Are they simply an inherent part of the incredibly complex machines that computers are, or can we build a computer that "just works" like a toaster or microwave?

So what is a computer crash? Traditionally a computer program is said to have "crashed" when an error in the programming code, a bug, causes the program to quit unexpectedly. This behavior of quitting unexpectedly is found in both operating systems and applications. When Windows crashes, the Blue Screen of Death appears. Applications like Firefox usually just disappear when they quit. Another type of malfunction is a computer freeze, which occurs when the computer or application simply stops responding to user input. This is when the famous Ctrl+Alt+Del would sometimes help.

Whether the computer crashes, freezes, has a hardware issue, is infected with a virus, or has any major malfunction, the end result is the same for users because it interrupts their work and wastes time. Therefore, instead of simply focusing on the complete system stop errors as crashes, I will treat any major malfunction from the way a computer is supposed to work as a crash.

Chronicled in most computer history books and the biography section of the U.S. Navy website, the first report of a computer bug was entered into a lab notebook on September 9, 1945 at Harvard by Lieutenant Grace Hopper. A moth was stuck in Relay #70, Panel F, of the Mark II Aiken Relay Calculator. Although the term "bug" was already in use at the time, Hopper cemented its use when she wrote in the log "First actual case of bug being found" and spread the word that her team had "debugged" the computer (US Navy).

Another milestone in the history of crashes would be the first malicious personal computer crash. According to "A 20-year plague," an article by CNet, a major computer news and review website, the first computer virus was created in 1982 by a ninth-grader as a way of playing jokes on his friends. Rich Skrenta modified the floppy disks of his friends' computer games so that the computer would reboot every few times the game was played or would display humorous messages from Skrenta. Soon friends refused to loan Skrenta their floppies out of fear of his tampering. Determined to carry out his jokes, Skrenta wrote a "cloner" program, Elk Cloner, that infected Apple IIs and spread through standard floppies. Elk Cloner restarted computers every fifth time they booted up (Lemos). Clearly, the security of the computer was weak if an unknown program could force the computer to restart without the user's consent- a crash. Though in 1982 a few computers were crashing because of a ninth grader's program, viruses at this time spread relatively slowly and could be contained. Since then, computer viruses have become widespread, with millions of different varieties swarming the Internet. So a pressing question about crashes today is, "How do the malicious programs we see today spread over millions of computers?"

The first massive overnight computer crash was the malicious work of the Morris worm. This worm, a virus that spreads without user interaction, is almost universally considered the first computer virus that didn't depend on the physical infection of floppy disks, but rather spread over the Internet. Robert Tappan Morris, a first-year graduate student at Cornell, launched his worm on November 2, 1988 (Schmidt and Darby). The worm overloaded the Internet, and many people disconnected from the Internet during that time to avoid infection (Schmidt and Darby).

The Morris worm exemplifies what malicious computer software (malware) has historically done to computers to cause them to behave in an unexpected manner. Today the problem has become far worse. For example, the leading anti-virus maker, Symantec, better known for its Norton Anti-Virus, found 1.7 million new viruses in 2008 alone (Symantec). While some may say that viruses are not really computer crashes, but are security vulnerabilities, I think that many computer crashes are a direct or indirect result of computer viruses, and that malware should be considered as part of the reason that computers crash.

While malware is a problem, I personally have found from helping family and friends fix their computers that more often problems are caused by people adding more and more programs and utilities to their systems. Very rarely do I find a real hardware issue. Instead it is usually a case of a corrupted driver. My findings seem to parallel the research of a major computer reliability survey that found in 2006 that PC manufacturers had reduced annual hardware failure rates by 25 percent over the previous two years. Annual first-year failure rates for desktops dropped from 7 percent to 5 percent and fell from 20 percent to 15 percent for laptops (Gartner). While hardware failure was a bigger issue in the past, I believe that it is rare enough now that it only accounts for a minority of computer malfunctions. In addition, unless manufacturing processes drastically improve, it is unlikely, in my opinion, for there to be a marked improvement in hardware reliability, and therefore we should focus on software to help minimize computer crashes.

Perhaps if computer companies focused on the overall reliability and stability of their OS's, we could build computers as easy to use and reliable as phones, radios, and televisions.

If hardware issues can mostly be ignored, then how do we fix the remaining problems? User errors can be reduced with more training. In addition, a consistent and intuitive interface could minimize the chance for mistakes by the operator. However, I believe significant responsibility for a reliable computer rests with the software company that designed the operating system (OS). In my opinion, a stable OS is going to contribute the most to minimizing computer crashes because the OS is the foundation that applications are built on. Almost every time an application needs to open a file, interact with the user, print something, etc., the application relies on the OS through application programming interfaces to perform these tasks (PC Magazine). Without a reliable OS, it would be hard for programmers to write programs that don't crash. Therefore, a stable OS will reduce both system and application crashes. Perhaps if computer companies focused on the overall reliability and stability of their OS's, we could build computers as easy to use and reliable as phones, radios, and televisions.

The major OS makers are Microsoft, Apple, and Linux. I find that many people view computers through the lens of Apple's "Get a Mac" ads that portray PCs with Windows (especially Windows Vista) as slow and prone to failure and Macs as shiny, hip, and crash-resistant. I have read numerous studies by a number of technology review magazines and blogs to support this perception, but most of the data seemed biased and hard from which to draw good conclusions about whether Macs are actually more reliable than Windows PCs. So instead I looked at how the companies operate.

Microsoft focuses on building large business solutions such as their Exchange server, Active Directory and IIS web server, and software development tools for programmers (Microsoft, "Enterprise Solutions from Microsoft"). Because operating systems are just a part of their business and must maintain compatibility with older business software, they only make small changes to Windows. Furthermore, Microsoft does not control the end product and so loses final say about the quality of the software integration with hardware from Dell, HP, Lenovo, and many other computer manufacturers (Curtis). Additionally, because each PC manufacturer has many models with various hardware specifications, Microsoft must build Windows to run well on everything from netbooks to powerful gaming machines. All of the different hardware requires device drivers that are usually written by the hardware manufacturer. This further reduces the quality of integration between Windows and the hardware. Apple, on the other hand, designs both the hardware and software in its products, which allows Apple to thoroughly test and target its Mac OS X to run well on the specific hardware it was designed for (Curtis).

Apple generally targets the home and small business market, perhaps because large corporations often build their own software using Microsoft professional developer tools. This corporate preference, along with the fact that Microsoft is willing to sell Windows on computers with lower price points, gives Windows market share of about 93% and gives Apple about 5% of the global market for operating systems. A much smaller minority of the market, less than 1%, runs a variant of Linux (Net Applications). Because of the remarkable advertising of Apple, more home users may switch to Apple, but most business buyers will probably continue to stick with Windows simply because it would be too expensive to retrain employees, transfer all of their computers to Macs and redevelop the business software the companies use. Therefore, I think the market has pretty much stabilized. Because Microsoft and Apple know that most of their customers are unlikely to change operating systems regardless of upgrades, the companies lack motivation to improve them. Instead, today, in operating systems, we have the same basic design from the 1980s with a prettier graphical user interface (Microsoft, "Windows History").

One big change looming for computer operating systems is the introduction of a cloud OS, which is an operating system designed for the modern era by assuming a continuous connection to the Internet. The goal of a cloud OS is for all of a user's data such as photos, music, documents, bookmarks, and emails, along with the applications, to be stored on the Internet. This means that a cloud OS makes it easy for the user to access his or her files and programs from anywhere with an Internet connection. A cloud OS will work without the Internet for short periods of time, but is most useful with an Internet connection. A few years ago, this would have been impossible. However, now broadband Internet is almost universal and cellular networks give high-speed Internet access almost anywhere. Reliable access to the Internet will only improve with time, so it is starting to make sense for a cloud OS. Additionally, old business applications are slowly being replaced by online applications such as Salesforce.com. Eventually the most important function of the computer will be to get people online via the browser.

The main draw of Chrome OS is that it transfers the responsibility of maintaining the operating system from the user to Google. Because the computer is connected to the cloud (the Internet), it can automatically download any patches as needed.

Google is currently working on its version of a cloud OS called Chrome OS, recently released to computer enthusiasts for testing. Google Chrome OS is simply an operating system that is just the web browser. There will be no software to install, no maintenance to perform. Turn on the computer and have almost instant access to the Internet! The main draw of Chrome OS is that it transfers the responsibility of maintaining the operating system from the user to Google. Because the computer is connected to the cloud (the Internet), it can automatically download any patches as needed. This feature will greatly enhance the reliability and security of computers. If any bugs are discovered, every Chrome OS computer will be instantly updated. At first glance, this might seem like a bad idea because one cause of computer crashes is a bad update. However, updating Chrome OS will be different from the current update process because instead of relying on a user to update their computer or fix corrupted programs, Google will push out the updates to Chrome OS automatically. This will create the same homogeneous environment that Google has perfected for its search engine computers. A uniform system is what gives Google the 99.99% uptime that Google and Gmail have (Royal Pingdom). Google plans to sell Chrome OS pre-loaded on computers that are similar internally. This standardization of hardware will allow Google to ensure that every Chrome OS computer will behave identically. Hopefully new hardware standards will cut the number of crashes resulting from programming bugs or malware to almost zero because there will be almost no functional differences in the computers (The Chromium Project, "Software Architecture"). One source of crashes is a corrupted hardware device driver. Some device drivers have to operate in a privileged mode that gives them the power they need, but this usually results in a total crash if something goes wrong. To help eliminate some of the traditional sources of computer crashes such as corrupted device drivers and malware, Google is using a mathematical error check that can detect if any part of the OS has been changed or corrupted. If there are any unexpected changes, a fresh copy of the OS is downloaded and installed automatically (The Chromium Projects, "Verified Boot").

In addition to the uniform reliability and security of Chrome OS, its user environment is simple. The simple user interface will allow for better training for computer users everywhere and likely decrease the number of computer crashes from user error. Another advantage of Chrome OS is that your data will be stored on a remote server so if your computer hardware fails or your computer is stolen or destroyed, you can simply access your account from another computer and download the data.

While some would argue that storing data on the cloud is less secure, and that is true in theory, the response Google has given is that large companies like Google have a team of security professionals and the resources to better secure the data than an individual or a small business (Feigenbaum). In addition, Google has released the source code to Chrome OS so that anyone can see exactly what they are doing. I believe that Chrome OS will actually enhance both security and privacy by reducing the threat of malware that steals your private data and monitors you. Finally, other companies are sure to follow Google's lead and develop their own cloud operating systems that may give better security and privacy if that is a concern.

One potential downside to Chrome OS is the lack of compatibility with existing computers. This is unfortunate. However, if Chrome OS is going to be successful, there will have to be the draw of consistency and stability that can only arise if future hardware is built to open standards. For example, there should be no need to install a printer driver simply because a new printer is plugged in. Instead, if all printer manufacturers agreed on a common standard for communication between the computer and the printer, Google could write a universal printer driver that would work with every printer built to the common standard. This is the way USB flash drives and Wi-Fi work today.

Another potential drawback to Google Chrome OS is that everything is done inside the web browser and there are no programs to install locally. For example, there would be no way to install or use programs like Word, Excel, or Photoshop. Google could allow the installation of local apps, but then this would open the door to malware and applications that crash or freeze because of a conflict or a botched installation. Instead Google hopes for everything from email and documents to photo and video editing to be done online. This is already feasible in some cases with online apps such as Gmail and Google Docs, but for true native performance and functionality, the web programming languages and methods will have to be developed further.

While there have been tremendous advances in computers over the last three decades, we are still using computers that have at the core an operating system designed for the era before the Web. As the computer matures, we have a chance to pursue the dream of being able to work on a computer anywhere and then go to any other computer to access the same documents, emails, photos, spreadsheets, and programs with a seamless experience. This is the vision of the cloud OS. Even though Google Chrome will not be perfect, it is a step in the right direction towards the dream of crash-free computers.


Works Cited

Curtis, Jerry. Why Mac is better than PC . n.d. 28 July 2010.
<http://www.helium.com/items/1724245-mac-is-better-than-pc>.

Feigenbaum, Eran. "Secure in the Cloud." 2 November 2009. Google European Public Policy Blog. July 2010.
<http://googlepolicyeurope.blogspot.com/2009/11/secure-in-cloud.html>.

Gartner. "Gartner Says Annual Failure Rates of PCs Are Improving, but Manufacturers Can Do Better." 26 June 2006. Gartner. 28 July 2010.
<http://www.gartner.com/press_releases/asset_154164_11.html>.

Lemos, Robert. A 20-year Plague. 25 November 2003. 28 July 2010.
<http://news.cnet.com/2009-7349_3-5111410.html>.

Microsoft. Enterprise Solutions from Microsoft. n.d. 28 July 2010.
<http://www.microsoft.com/enterprise/default.aspx>. <http://www.microsoft.com/windows/WinHistoryProGraphic.mspx>.

Net Applications. Operating System Market Share . 1 December 2009. 1 December 2009.
<http://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8>.

PC Magazine. API Definition from PC Magazine Encyclopedia. n.d. 29 July 2010.
<http://www.pcmag.com/encyclopedia_term/0,2542,t=application+programming+interface&i=37856,00.asp>.

Royal Pingdom. Google availability differs greatly between countries. 26 September 2007. 28 July 2010.
<http://royal.pingdom.com/2007/09/26/google-availability-differs-greatly-between-countries/>.

Schmidt, Charles and Tom Darby. The What, Why, and How of the 1988 Internet Worm. July 2001. 28 July 2010.
<http://www.snowplow.org/tom/worm/worm.html>.

Symantec. "Symantec Global Internet ." Security Threat Report.

The Chromium Projects. Software Architecture. n.d. 28 July 2010.
<http://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/software-architecture>.

Verified Boot. n.d. 29 July 2010.
<http://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/verified-boot>.

U.S. Navy. US People--Hopper, Grace Murray. n.d. 28 July 2010.
<http://www.history.navy.mil/photos/pers-us/uspers-h/g-hoppr.htm>.