Implementation

We implemented a prototype by extending the GNU C compiler on Linux. We augment each object file with type information of automatic and static buffers, leaving the source code intact. Specifically, we intercept the output of the gcc preprocessor and append to it a data structure describing the type information. The augmented file is then piped into the next stage to complete the compilation.

The type information of buffers are read by precompiling the (preprocessed) source file with debugging option turned on, and parsing the resulting stabs debugging statements. From the stabs debugging statements we generate a type table, a data structure that associates the address of each function with the information of the function's automatic buffers (their sizes and offsets to the stack frame). The type table also contains the addresses of static buffers declared in the source file and their sizes. This way, each object file carries information of its automatic / static buffers independently. The type table is kept under a static variable so objects can be linked without any conflict. To make those type tables visible at run time, each object file is also given a constructor function³. The constructor function associates its type table with a global symbol. This process is illustrated in Figure 2.

Our implementation is transparent in the sense that source files are unmodified, and programs are compiled normally using the supplied makefile in the source distribution. It is also highly portable because the augmentation is done in the source level. Because type tables in the object files are assembled at run time, objects can be linked both statically and dynamically.

The range checking is done by a function in a shared library. The range checking function accepts a pointer to the buffer as the parameter, and finds the size of the buffer according to the following algorithm (for an automatic buffer; locating a static buffer is straightforward). Figure 3 illustrates this.

**Figure 3:** The stack frame of the buffer is found by comparing the address of the referenced buffer and saved frame pointers in the stack (address of the buf should be less than its frame pointer since it is a local variable). The first frame (in dashed box) is the frame for the buffer. The return address of the next frame is used to locate the entry in the type table (address of main), which is used subsequently to find the size of the buffer. It is assumed that the stack grows down, and the address of the buffer is that of its least significant byte (little endian architecture).

The shared library also maintains a table of currently allocated heap buffers by intercepting malloc(), realloc() and free() functions (a feature of the dynamic memory allocator in GNU C library). For the heap buffers, the size of the referenced buffer is determined as the size of the allocated memory block. Without type information it is currently unable to determine the exact size, which may be significant as evident in Figure 1. We implemented a shared library that is preloaded to intercept vulnerable copy functions in C library to perform range checking.