6.033 Discussion Suggestions: Buffer overflow

6.033--Computer System Engineering

Suggestions for classroom discussion

Topic: Why are there so many buffer overflow bugs?

By J. H. Saltzer, April 2003, with contributions by several members of the teaching staff.

A question worth exploring in recitation is exactly why buffer overflows are so easily exploited in attacks on Internet listeners.

I think that it can be explained as an unintentional conspiracy of four common practices:

Compilers allocate space for arrays as a contiguous chunk of memory, with the first element at some starting address and successive elements at higher-numbered addresses.
Primarily because there isn't any hardware support for doing anything better, most operating systems allocate a single, contiguous block of address space for a program and its data. The addresses may be virtual, achieved by paging or base-and-bounds, but the important thing is that the program sees a single, contiguous block of addresses.
Faced with this single block of memory, programming support systems, including the support system for UNIX, suballocate the address block into three regions: They place the program code in low-numbered addresses, they place static storage (the heap) just above those low-numbered addresses, and they start the stack at the highest-numbered address and grow it down, using lower addresses, toward the heap.
Many programs use the standard C library function
gets(char *s)
rather than
fgets(char *s, int n file fd)

to move character string data from an incoming stream to a local array, identified by the pointer *s. The important difference is that gets() reads characters until it encounters a new-line character or end of file, while fgets() adds an additional stop condition: it stops after reading n characters. Ron Rivest observes that using gets() rather than fgets() is an example of Worse is Better: it is "slightly better to be simple than to be correct."

The stage is now set. A program that is listening on some Internet port for incoming packets allocates a buffer in its stack, say of size 30 characters, to hold some field from the packet, knowing that the field should never be larger than that. It reads data of the packet from the port (socket) into the buffer, using gets(). An attacker prepares and sends a packet in which that field contains a string of, say, 250 characters. So gets() overruns the buffer.

Because of the compiler practice of placing successive elements in higher-numbered addresses, and the program's decision to place the buffer in the stack, the overrun smashes things in the stack at higher-numbered addresses. But because the stack grows toward lower-numbered addresses, the things smashed by the buffer overrun are all *older* things, allocated before the buffer. One important older thing is the variable that holds the return point of the currently running procedure. So the return point may be vulnerable.

The standard exploit is thus to include runnable code in the packet, and, knowing offsets, smash the return point stack variable to contain the address of that code.

We can draw several lessons:

Although it has no relevance to the buffer-overflow/stack-smashing exploitation, growing the stack in the direction of the heap is potentially dangerous, because programs usually grow the stack without checking to see if there is room for whatever they are adding. Systems with a paged virtual memory can limit the potential for damage by providing one or more unmapped pages in the region of address space that separates the heap and the stack.
Be explicit. One can interpret the underlying problem with gets() to be that it, as well as many other C-language string-manipulation routines, relies on its context, rather than the program, to tell it exactly what to do. When the context contains contradictions (a string of one size, a buffer declared to be of another size) or ambiguities, the library program may resolve them in an unexpected way. There is a trade-off between convenience and explicitness in programming languages. When security is the goal, a language that demands explicitness is probably safer. (This observation tends to ignite religious battles, of course.)
The hardware architecture can of considerable help in minimizing the impact of common accidents, and thus reducing the number of exploitable bugs. Consider, for example, an architecture that gives the user program distinct, hardware-enforced memory segments (for example, one segment for program code, a second segment for the heap, and a third segment for the stack):
- Different segments can operate with different read/write/execute permissions. The stack and heap segments might disallow executable instructions, while the program area disallows writing. (The general principle is that no region of memory should be simultaneously writable and executable.)
- One can go one step further, and create read-only segments for unchanging data such as program literals; doing so eliminates opportunities for an exploiter to reinterpret that data as instructions.
- These restrictions interfere with dynamic code generation, but they have the benefit of forcing the designer to think about when it is appropriate to "bless" a pile of just-generated bits by declaring it executable.
- The natural way to place a stack in a separate segment is to base it at location zero of that segment and grow toward higher addresses, if only to make it easier to allocate more stack when needed. But allocating the stack this way would mean that most buffer overflows affect only the currently running program, not its ancestors.
The Multics system hardware provided such segments, stacks grew in the same direction as arrays, the software followed the principle of least privilege in setting permission bits, and addressing errors were virtually always caught by the hardware at the instant they occurred, rather than leading to a later system meltdown. There are several secure Unix efforts that adopt some of the same underlying ideas to reduce vulnerability, but non-executable stacks and non-writable program areas require at least some hardware support, and the MMU's of some of the most common hardware platforms--i386 and powerpc--are claimed to be inadequate in one way or another (I haven't checked those claims).
Storing a return point in the midst of writable data is hazardous because it is hard to protect the return point against either programming errors or malicious attacks. People have devised various schemes to reduce the hazard. One such scheme is to have the procedure prologue store an unpredictable constant (a "canary") adjacent to the stack cell that holds the return point and, before using that cell as a return point, verify that the canary is intact by comparing it with a copy stored in a different class of storage, such as the heap.
(From Ron Rivest), when security is involved, Worse is Better seems to be problematic; The Right Thing appears to be the only way.
Comments and suggestions: Saltzer@mit.edu