We thought we had this stuff figured out back in the 1970s. What went wrong? Jerome H. Saltzer, MIT (outline for a keynote talk at the 2006 ISSSE) {slide 1: title} Starting around 1964, some colleagues and I spent ten years working on Multics, one of the first multiple-user computer systems that took a serious interest in allowing users to control the extent to which they shared information. That seems to be the reason why I was invited to give this talk, because for most of the time since then I've been working in other system design areas. Since I'm pretty far out of date on computer security, please be gentle if I say some things that everyone in the audience views as obvious, old-fashioned, or naive. Computer security involves a lot of different things. The subject of this symposium might be summarized as "getting the code right". That is an important component of security, but security is a systems problem, to which many different components contribute. I would like to look at a different component, starting with a little bit of history, with the goal of trying to figure out how we got to where we are. Something to keep in mind as I speak is that the real world is vastly complicated, and that I'm going to ruthless oversimplify in order to emphasize a few points. But my points have to do with a collision between a fundamental design principle and some irresistible forces, so I claim that simplification is appropriate. My plan is to talk about these things: {slide 2: bullets} - what we thought we knew in 1975 - what happened next (PC's and the Internet) - what went wrong (neglect of complete mediation) - what happened in UNIX (buffer overflows) - what went wrong (neglect of complete mediation) - why? (things are not going to get better) The essence of what I am going to say is that in two different situations, a fundamental design principle was neglected. In the first case, I propose to explore the history in the hope of understanding why it was neglected. For the second case, I'm going to suggest that the way that the design principle applies is not especially obvious. After drilling down to the mechanics, I then want to come back up to the top and see what we might learn from these two examples for the future. By the mid-1970's, what seemed like some useful models for security had been developed. The primary model was lightweight security for a typical time-sharing environment. {slide 3: two models} It consisted of a matrix model, typically implemented with principals and access control lists and occasionally with capabilities. And there was also a parallel model for more demanding environments such as the military, based on information flow control and non-discretionary or mandatory security. Mike Schroeder and I wrote a tutorial paper for the IEEE Proceedings in 1975 that proposed a set of security design principles, and it also described several specific security mechanisms. {slide 4: scan of top of first page} One result of that paper is that we are both frequently asked "what would you say different today?" Obviously, things have moved ahead since then. At the time we wrote that paper, the Internet was still the ARPAnet, {slide 5: ARPANet in 1980} but it was already apparent that one security problem that the ARPAnet brought to the front was authentication: before taking any action on an incoming message, you need to know who sent it to you. At the time it was pretty obvious that cryptographic technology was appropriate for authentication, so the paper mentioned that, but that technology was almost entirely classified at the time, so there wasn't much that we could say about it. A modern version of that paper would probably spend substantially more space discussing how to know for sure the source of each incoming message or stream segment. Incidentally, it would also take care to better separate the mechanics of cryptographically-provided authentication from the mechanics of cryptographically-provided secrecy. As for other topics, a modern version of that paper would probably have a good-sized section on how one develops trust in a principal and more generally on what you are relying on when you trust the output of a computer system. It would also emphasize more the iterative nature of security system design: lay out the threats and specify counter-measures, design feedback into the system, and then, once the system is running, use the reports generated by the feedback to identify unanticipated threats and inadequate implementations and iterate the specification, the design, and the implementation. This notion of iterative security design underlies the saying popularized by Bruce Schneier that "security is a process, not a product". The paper also missed a few other things, such as defense in depth, but on the whole it did identify a lot of things that still seem to be important to security. It offered both design principles and concrete methods for dealing with not all but many of the security problems that we have today. So much for what we thought we knew. {slide 6: "progress report"} It is 30 years later. What happened next? 30 years later we are finding that there are massive shortcomings in computer security, with bot armies, denial-of-service attacks, viruses, worms, phishing expeditions, buffer overflows, and untraceable spam all causing major problems. To illustrate, here is just one day's activity on Bugtraq. {slide 7: Bugtraq--check symantec, f-secure, sophos & McAfee} So what went wrong? I have an interpretation that I'm about to offer. It is not because the matrix model doesn't work, though that interpretation may have some legs. It is not because the security design principles were wrong. They still seem to be holding up pretty well. It is because one of the most important security design principles has been neglected. The principle I'm thinking of is called "Complete mediation". {slide 8: the three questions} It says that before you take a requested action you should be sure to know who requested it, that the request hasn't been tampered with, and that the requester has permission to make that request. Briefly, you should verify authentication, integrity, and authorization. The real question then, is {slide 9: stating question}: "Why was this design principle, which is so fundamental to security, neglected?" I think that three things conspired to produce this situation {slide 10: overlay with answers}: 1. The price of a personal computer dropped more rapidly than anyone expected, creating a huge market for cheap computer systems that were unprepared for security. 2. When the Internet was opened to commercial use, it moved from backwater to major industry more rapidly than anyone expected, creating another, this time huge, system that, for different reasons, was unprepared for security. 3. The U.S. government (along with several other governments) effectively blocked one primary countermeasure that could have provided network security. Let's talk about these three things one at a time... 1. Technology change. Around 1977, the push to create personal computers started. The technology wasn't quite ready yet, but by 1980 there were enough early adopters around who willing to give it a try, despite $5K price tags. The result was that the first personal computers were built with every possible price-cutting shortcut. Omitting all consideration of security seemed like an acceptable shortcut, because of the ground rule that a personal computer has just one user. So there was no security in the hardware. And there was no security in the software. Technology improvements over the next decade focused on bringing the price down rather than on bringing the function up, and that focus led to development of a mass market. There were certainly people thinking about security, but they weren't the ones driving the market. Furthermore, in the dominant part of that mass market, the one-user ground rule continued to apply, so by 1993 there was still no security in the hardware and no security in the software. By then, personal computers had become a major industry, and it also started to occur to people that a personal computer might be more useful if it were attached to the Internet. Unfortunately, that attachment changes a crucial ground rule. Just because you have only one physical user doesn't mean that you have only one user. Arriving messages represent other users, and acting on those messages carries a risk. The principle of complete mediation says that before you act on a request you should know with confidence who made the request. But the interest in attachment dominated whatever awareness there was that the lack of security in the personal computer might create a problem. The arrival of the Internet takes us to item #2, the communications revolution. 2. The Internet was originally dismissed as irrelevant by all of the big communications companies, so it developed in a kind of backwater in universities and government labs until it was not only clearly useful, but clearly a big winner. Once it came to the attention of the rest of the world (around 1994) it took off like a house afire. The result was that the protocols that happened to be in place in 1994 became for all practical purposes cast in stone. We are still using TCP, UDP, SMTP, POP, DNS, etc., with only minor tinkering, 15 years later. But none of those protocols comes with significant security attached. Again, at the time, it was pretty well understood how security could have been attached to those protocols. The security vulnerability of those protocols (particularly the observation that an arriving message does not routinely come with any credible form of authentication) takes us to item #3. 3. Government interference. It was apparent in the late 1970s, when the first Internet protocols were being developed, that providing security of network communications was a good idea, even though not particularly required in the cooperative environment at that time. It was also apparent that cryptographic techniques for authentication and confidentiality were the way to do it (people hadn't quite gotten encryption and authentication distinctly separated in their minds they way they have today, but that is a side issue, not the main concern.) The data encryption standard called DES was adopted by the National Institute of Standards & Technology (NIST) in 1976. Rivest, Shamir and Adleman came along with the practical public-key system known as RSA in 1977. Our 1975 paper suggested using cryptographic techniques both for authentication and secrecy, but it also pointed out that nearly all interesting work on the topic was classified. In fact, there were suggestions from the outset of the Internet that the default for all Internet protocols should be that they be cryptographically secured. But these suggestions were viewed as infeasible for three reasons: - cryptographic hardware was too expensive - cryptographic software was too slow - you couldn't export hardware or software that used cryptographic techniques There were, of course, lots of other problems to work out: how to distribute keys or develop a public key infrastructure, devising protocols that didn't accidentally compromise the cryptography, avoiding man-in-the middle attacks, etc. But the urgency in tackling those problems was low because of the third consideration. And several teams did figure out how to solve those problems. They developed secure protocols. But those protocols never made it out of the lab. Why? Well, here is one example from personal experience... At MIT in 1986, using DES as the cryptographic engine, we developed the authentication system called Kerberos to provide single login to an array of different network services. At the same time, we developed the X Window system, a virtual display interface that allowed the display and the application program to be separated by a network. We made both of these things available to the world under an open source license. The X Window system rapidly became a standard interface for nearly all UNIX-based engineering workstations. But Kerberos hardly made a dent. We suggested Kerberos to several different vendors of computer systems. Every vendor we talked to was enthusiastic about the concept but said that they couldn't touch it, because of the export restrictions. The immediate problem they brought up was that they couldn't justify investing in a product that (1) could be marketed only within the United States and (2) came with a requirement that you tell your customers that misuse could lead to jail time. Discussions with the government agencies involved produced a response along the lines of "The natives can use blowguns; we don't think they should be armed with automatic weapons." Result: When PC's were attached to the Internet, (1) the PC's had no internal mediation, and (2) the Internet offered no way to tell for sure who just sent you a message, so there was no way to perform mediation on incoming messages. This combination laid the groundwork for today's massive worm outbreaks, phishing, spam, and other malevolent activities. One can imagine a completely different scenario. Suppose, instead of taking the attitude that "the natives can use blowguns" the U.S. government had instead strongly encouraged the use of cryptographic techniques in the early development of the Internet, and eliminated export restrictions on DES-level cryptography. The result might have been: - the early infeasibility arguments for default cryptographic authentication of network protocols might have been reversed. Despite slow software, the prospect that hardware was getting cheaper might have encouraged inclusion of cryptographic protection of the Internet protocols and key distribution methods, perhaps with low-powered encryption at the outset, planning to insert higher-powered stuff later. Vendors would have seen their way to embracing developments like Kerberos. Since there would be the potential of a world-wide market, the number of DES hardware chips manufactured might have been large enough that the price came down to the point where at least high-end workstations would have made it a standard feature. Once that toehold was established, by now the function arguably would have worked its way down to the cheapest PC, and we would know the source of every arriving message. - with network protocols automatically providing authentication of every incoming frame, packet, or message, retrofitting firewall security on PC's and PC operating systems that otherwise had none probably would have been straightforward. There would certainly still be problems of getting the design and the code right, and vulnerabilities would still exist. But the machinery to systematically tackle those vulnerabilities would be in place. - If the O/S knows with confidence the source of every incoming message, the worms, phishing expeditions, and much of the spam of today would be a hypothetical concern rather than a day-to-day headache. There would still be problems, of course, but my guess is that they would be localized, and not nearly so pervasive. --------- Now let's shift gears and discuss the historical accidents that made UNIX vulnerable to buffer overflows. {slide 11: "next"/UNIX} I am going to suggest that once again, defeat was snatched from the jaws of victory when complete mediation was neglected, though in a more subtle way. As I said earlier, the application of the principle of complete mediation to this case is not especially obvious. At least, I didn't find it obvious at first. Perhaps you will. Unix starts out with what seems like a fighting chance to provide security. It uses virtualization to partition a computer system to keep multiple users properly separated. It has principals and a simple form of access control list, and various other mechanisms that provide moderately complete mediation. But it has proven vulnerable to buffer overflows. There are several issues here. They are well-known in this community (if you Google for "stack smashing" you get 100,000 hits) and an amazing number of patch schemes have been devised, but I want to review it quickly because I'm going to propose a slightly different analysis of the problem. First, why do programmers make this mistake so often? The basic strategy of relying on the programmer to do bounds checking is intrinsically risky. Humans make make mistakes, they misevaluate the situation, they take shortcuts. In this case, a typical shortcut found at the scene of an accident is the use of gets() rather than fgets() to copy a string from a field of an incoming packet into a buffer. When you copy a string, you need to know how many bytes to copy. fgets() takes an argument in which the programmer specifies the maximum string length that the buffer can absorb. gets() takes its instruction on number of bytes from the string itself. But the string may be supplied by an adversary. (Using gets() is thus already an example of neglecting complete mediation, though it is not the primary one I am concerned about.) fgets() is certainly better, but the programmer still has to specify the maximum string length correctly, so you can still screw up. The real problem is depending on the programmer to do accurate bounds checking, rather than having the system do it automatically. Second, why are such bugs so easy to exploit? {slide 12: buffer overflow} One reason is that the typical UNIX stack grows toward lower memory addresses, but arrays grow toward higher memory addresses. The result is that an array overflow tends to overwrite older things in the stack. One of the older things is the return point of the currently running procedure. Bingo, an easy exploit. The adversary simply overwrites the return point with an address of his or her choosing and when that procedure returns, the adversary gains control. Well, why aren't buffer overflows caught automatically? Neither the hardware nor the implementation language offers any support. There are languages that do bounds checking, but UNIX is written in C, which relies on the programmer to do bounds checking. But these observations are about the superficial symptoms. Hiding underneath there is a deeper issue of misplaced trust and lack of complete mediation. {slide 13: "next/went wrong"} The overflowed buffer is just the entry mechanism. The real target of most buffer overflow attacks is the return point of a procedure, or a jump switch, or any other control variable. A control variables has a key property: if its value can be modified by an adversary, that adversary can grab control of the processor. A return point stored in a stack frame happens to be the simplest example, but a computed goto, a procedure variable (funarg), a dispatch table, all may present potential vulnerabilities, whether they are stored in the stack or in the heap. And, as some exploits have demonstrated, it isn't even necessary for an adversary to supply code--the usual address space contains plenty of code that will do dreadful things if you jump to it. One response has been to apply the principle of least privilege: set up permissions so that memory regions that hold code can't be written into and that memory regions that hold data can't be executed. We used that technique in Multics in 1968 and it turned out to be incredibly effective in catching bugs quickly, because they almost always caused an instant protection exception, rather than muddling on and causing real damage that is hard to trace. Setting permissions narrowly is certainly a move in the right direction, and it makes buffer overflows harder to exploit, but it actually doesn't block the basic attack. The control variable is still in a writable location where an adversary may be able to overwrite it and thereby cause a transfer of control to some place you didn't intend. Another response has been to try to protect the control variables from tampering, using techniques such as canaries and rearranging the order of variables in the stack. Protecting control variables is getting much closer to the heart of the problem. The real thing that is going wrong is lack of complete mediation. Let me explain. Remember that the processor should ask three questions of each instruction before executing it: authenticity, integrity, and authorization. {slide 14: return instruction} Let's look at the return sequence: load program_counter from return_point We know which principal supplied the load instruction, because it came from a program that the current principal is running. We may even know that the instruction has been authorized, because it came from a region of memory that has execute permission. But what can we say about the integrity of the instruction? {slide 15: integrity emphasis} Has anyone tampered with it? The code itself hasn't been tampered with, because it was (I hope) retrieved from a write-protected region of memory. But its argument, return_point, is vulnerable to tampering. Has the value of return_point changed since it was stored at call time? Here is our neglected mediation. To pick up a variable in memory and transfer to that address is risky unless you have complete trust in its value: you need to know with confidence who set that variable. Failure to check that out is a failure of complete mediation question 2: integrity. It is essentially the same as accepting a message over the internet knowing who who originated it, but forgetting to check whether someone else modified it along the way. So how we do ask question 2? Once we understand that that is the problem, it isn't too hard to start inventing answers. Canaries are one stab at an answer: if you can't protect the control variable, at least try to detect tampering. Here are three slightly stronger examples: - At call time, store two or more replicas of the return point in widely separated places, and at return time compare the replicas. Replication is also a weak countermeasure against a determined adversary, but probably would catch many buffer overflows. - Create a separate call stack that holds nothing but return points. Ideally this stack might be protected from writing by anything but proper call and return instructions, but protection probably would require some hardware support. And it wouldn't cover function variables and dispatch tables. - At call time, sign the stored return point with a cryptographic key known only to the running thread, and at return time, verify the return point value. More generally, sign all control variable values at the time of writing and verify them at the time of use. (This method would be quite strong, but it complicates thread management--you have to keep track of what key to use--and it might be slow) There are undoubtedly other, simpler, and faster ways of doing it; the primary goal is to recognize that since the return point is a writable variable, protecting it from tampering is hard. Rather than trying to protect it, it might be better to take a lesson from the world of communications and verify its integrity. Following this lead, you might wonder if the integrity of all data needs to be checked. That is a potential conclusion, but it may be possible to draw some useful booundaries around the problem. If an adversary can influence the argument of an ADD instruction, it is likely to foul your computation up, but it can be fairly hard for the adversary to systematically to take advantage of the situation. But if an adversary can influence the result of a JUMP instruction, exploitation becomes much easier. So focusing attention on the integrity of the arguments of JUMP instructions actually seems like an useful goal. -------------- I think that I have beaten the neglected complete mediation dead horse more than it is worth, so let's move on and wrap this thing up. {slide 16: next/analysis} Here are a couple of closing observations. First Is it over? Are things getting better? {slide 17: "The future"} Now that Bill Gates has announced that security is important, can we finally relax and expect to see all of the security bugs gradually getting fixed, and fewer new ones arriving? Perhaps subcontract security to a handful of paranoid specialists so that we can move on to something more interesting? The systems group at MIT has puzzled over this question and tentatively concluded that the answer must be NO, for two reasons. {slide 18: "The future/No!"} The first reason is the rapid rate of technology change. Technology continues to improve so rapidly that people rushing to exploit it create new vulnerabilities faster than the rest of us can discover them and root them out. We can see this lesson in the experience of the personal computer and the Internet. Although I pointed a finger at government interference, an equally strong contributor was rapid technology improvement. That is what allowed the PC to become a household item before it was secured, and that is what allowed the Internet to take off so rapidly, freezing its protocols in an insecure state. The second reason is that there aren't any physical limits on the size and complexity of programs that programmers can write. So even if technology were static, they can still generate code faster than anyone can possibly shake it out for security problems. And subtle problems such as lack of integrity checks on control variables may go unrealized for a long time. In the 1990's the problem was single-user PC's being attached to a network, and no longer being single-user. Today it is web interfaces to SQL databases, with the risk of SQL code injection. It is active data such as jpegs that can harbor malicious code. It is AJAX running programs in your browser that (1) you don't know about, (2) inside a sandbox whose trustworthiness is problematic and (3) the program was written by a contract application designer who started work just last week. Each of these things may be individually fixable, but tomorrow it will be something else, invented by yet another of the many well-meaning elves who create and distribute code, or code masquerading as data, in huge quantities. These two things converge in the following way: It is the nature of the von Neumann architecture that data and program are interchangeable, which means that the number of opportunities for active data and code injection is limitless. At the same time, technology improves, someone rushes in to exploit it, and creates yet another vulnerability. Add to this mixture that most programmers are quite naive on these points, and you have a guarantee of continued employment as a security wizard. If you are looking for a research problem in computer security, this may be a good place to start. Second. Not too long ago, Bruce Schneier wrote a book with the title "Secrets and Lies", which I referred to earlier. In that book he points out that particular security mechanisms such as cryptography aren't by themselves going to solve the security problem. The security problem involves all aspects of the system, including the people who use and misuse it. Let me give an example. Last month one of my students asked me to write a letter of recommendation for his application to Law School. I agreed, and he left a form with my administrative assistant. {slide 19: LSAC form} The next day I stopped by my assistant's desk to pick up the form. Glancing at it, I noticed that it was preprinted with the student's name, an identification number assigned by the Law School Admissions Council, and, to my amazement, his social security number, neatly identified as such. (The print is a little small, so I brought along an enlargement {slide 20: LSAC detail} Now, neither my administrative assistant nor I have any reason in the world to know his social security number. But there it was, and a copy of that form is now in my files where future administrative assistants and archivists will find it. If you visit the LSAC web site and propose to create an account, it offers a form that requests your SSN. The HELP button next to the SSN field responds with this explanation: This information is needed to match your online account to your LSAC records. It also allows LSAC to match such items as transcripts, letters of recommendation, score reports, and law school requests to your file. Your Social Security number or Social Insurance number is necessary to obtain your username and password or to reset your password if you forget it. In other words, everyone who has anything to do with your application will learn your SSN. And by the way, it is also a secret key to your password. So my closing lesson is that much as I appreciate the importance of getting the code right, I'm afraid that the bigger challenge in computer security is figuring out how to keep people from doing stupid things. {slide 21:real challenge} Our 1975 paper proposed a design principle that it called "psychological acceptability". What that meant is that it should be easy for a user to build a mental model of what the system does, and the operation of the system should closely match that mental model. A modern version of that design principle is sometimes called "the principle of least surprise". But that design principle takes us a only little way. It seems to require much more than that, and I don't have a clue how to proceed. Again, if you are looking for a good research problem in security, this appears to be another good place to start. At this point, I have probably pushed enough hot buttons that I should stop talking and sit down, or at least entertain some questions and discussion. Thank you.