HOW TO FIND MEMORY CORRUPTION BUGS by Greg Hudson Written 7/22/95 Last updated 7/22/95 Some of the most frustrating errors you will come across in Unix C and C++ programs are errors which corrupt memory. This document describes various approaches to testing and finding memory corruption errors in Unix programs. The document is organized as follows: 1. Analysis of the problem 2. Specific approachs 1. Purify 2. Checker 3. Electric Fence 4. Codecenter 5. gcc with bounds checking 6. Debugging malloc libraries A. Software available on Athena 1. ANALYSIS OF THE PROBLEM Bugs in any kind of program can be divided into two categories: bugs which cause visibly incorrect behavior as soon as the incorrect code executes, and bugs which corrupt state (variable values, data structures, files, etc.) such that correct code behaves incorrectly later on. Bugs in the former category are usually easy to find and fix, since you can simply trace the execution of the code up to the point of the incorrect behavior and see which piece of code failed. Bugs in the second category are often much harder to find, since there is no simple way of determining where the state of the program was corrupted. Memory corruption bugs can fall into either category. If your Unix program tries to dereference an uninitialized pointer, it will usually fail immediately (with a "segmentation fault" or "bus error"). On the other hand, if your Unix program writes beyond the end of an array or allocated block of memory, your program may crash much later on, and you may have a very difficult time finding out where your memory storage became corrupt. It is important to note that not all memory corruption bugs are state corruption bugs. Your program's memory storage, as managed by the compiler's stack allocation and malloc(), is a part of your program's state, which has well-defined, standardized invariants as well as a set of operations (such as reading or writing freed memory) which are never valid. Your program almost certainly has other invariants (which may not be documented or well-defined), and corruption of these invariants may lead to memory corruption indirectly. Thus, finding memory corruption bugs may only reveal another layer of state corruption in your own data structures. The best way to find corruption in your own data is to (a) carefully define and document the invariants of your program's state, and (b) periodically check your program's state, either piecemeal using assert() calls sprinkled around your code, or by writing a procedure to check some large amount of your program's state and calling it wherever you suspect state might have been corrupted. That said, the remainder of this document will focus on finding the errors which are not valid according to the standard C and C++ memory management discipline: * Reading or writing unallocated memory * Reading uninitialized memory * Freeing memory which was never allocated * Freeing memory which was already freed * Memory leaks (allocated memory which cannot be referenced directly or indicrectly through pointers on the stack or in static storage) * Array references beyond the bounds of an array (which may happen to reference allocated, initialized memory, but is still invalid) There is no single, perfect method for finding all sis categories of problems, but there are approaches which will find problems in most cases. This document will discuss approaches which try find memory access violations by checking every memory reference, and other approaches which try to detect past violations. 2. SPECIFIC APPROACHES 2.1. PURIFY Purify is a commercial product produced by Pure Software, Inc.. Purify works by modifying the executable code produced by the C or C++ compiler to check every memory access. As such, Purify is heavily dependent on the instruction set of the target platform; at the time of this writing, Purify only runs on Sun Sparc machines under SunOS or Solaris. Of the approaches we will discuss, Purify is probably the easiest to use. You simply prepend "purify" to the link line you use to build your program, and Purify will instrument all of the object files and libraries with memory access checks. When you run the program, Purify will display a window on your X display reporting reads or writes to unallocated memory, reads of uninitialized memory, and frees of memory which was never allocated or which was previous freed. (If you do not have an X display, or if you specifically disable the window option, Purify will display reports of programs to standard error [XXX or is it /dev/tty?]). Purify has a flexible mechanism for ignoring expected access violations depending on the call chain (for instance, the Solaris C library often reads uninitialized memory). After your program has finished running, Purify will look for potential memory leaks using a conservative garbage collection strategy (that is, Purify may fail to find memory leaks because it treats numeric values as pointers, but such confusions are rare since the vast majority of numeric values in most programs are not valid pointer values). Purify has the following limitations: * It is commercial, and expensive. * It only runs on Sun Sparcstations. * It will not work correctly on multithreaded code (because of multiple different thread stacks). * It will not find out-of-bounds array references if they happen to refer to allocated memory. 2.2. CHECKER Checker is similar to Purify, but is free [XXX developed by who?] and runs on Intel machines under the Linux operating system. Checker instruments the assembly output of gcc, and is therefore as platform-dependent as Purify. Checker does not have all the user interface features of Purify, and it will also [XXX check] not instrument libraries which have already been compiled. I have never used Checker myself, but apart from not instrumenting previously compiled libraries it should find the same kinds of errors as Purify does and have the same limitations. I don't know if Checker will help find memory leaks. 2.3. ELECTRIC FENCE The Electric Fence is another free Linux tool, developed by Bruce Perens, to find memory access violations. The Electric Fence uses mmap() to put different pieces of allocated memory in different virtual memory pages. I have not used the Electric Fence and do not know the details of its strategy, but its approach may give it some level of platform-independence and will also help find some out-of-bounds array reference errors which wouldn't be found by Purify or Checker (specifically, if the out-of-bounds access would ordinarily refer to another block of allocated memory, but does not because the mmap() strategy separates different blocks of memory.) 2.4. CODECENTER Codecenter (previously called Saber) is a commercial product of Centerline. Codecenter is a full-fledged C interpreter, and can therefore check all memory access violations including out-of-bounds array accesses which can't be found by Purify or Checker. As an interpreter, Codecenter is not highly platform-dependent (although as a commercial product, you must rely on CenterLine to port the program to new platforms), but it is heavily language-dependent [XXX does it do C++ right now?]. [XXX multithreaded code?] 2.5. GCC WITH BOUNDS CHECKING 2.6. DEBUGGING MALLOC LIBRARIES A. SOFTWARE AVAILABLE ON ATHENA