5 Related Work
5.1 Runtime Guarding Against Stack-Smashing and Format String Attacks
These techniques transform or augment a program to protect the return
address or other specific values from being overwritten. Stackguard [11] is a modified version of the gcc
compiler in which the generated code places canary values around the
return address at runtime, so that any overflow which overwrites the
return address will also modify the canary value, enabling the overflow to
be detected. StackShield [6] and RAD [9] are
based upon a similar modification to the compiler, but keep a separate
copy of the return address instead of using canary values. Libsafe
and Libverify [6] are dynamically loaded libraries which
provide protection for the return address without requiring recompilation.
Etoh and Yoda [16] use a source-code transformation approach which uses both
canary values and relocates stack-allocated arrays so that they cannot
overflow into local variables. FormatGuard [12]
transforms source code using a modified version of cpp (the C
Preprocessor) combined with a wrapper function for the printf
function, so that format-string attacks are detected at runtime.
While these techniques are useful for guarding against specific attacks,
their drawback is that they can deal with only a small subset of the total
set of memory exploits shown in Figure 2.
5.2 Runtime Bounds and Pointer Checking
These techniques prevent buffer overflows by checking each memory access
operation that can potentially cause a memory error to ensure that it does
not happen. Approaches used to insert the required checks have included
source-to-source translation [25, 5],
specially modified compilers [36, 22],
binary rewriting [19], and virtual machines/interpreters [24].
All of the above techniques currently suffer from significant drawbacks:
runtime overheads that can often be over 100%, restriction to a subset of
C-language, and changes to the memory model or pointer semantics. In contrast,
the focus of this paper is on techniques that produce very low overheads
and are fully compatible with all C-programs.
5.3 Compile-Time Analysis Techniques
Compile-time analysis techniques
[18, 32, 37, 14, 26]
analyze a program's source code to
determine which array and pointer accesses are safe.
While these approaches are a welcome
component of any programmer's debugging arsenal, they generally
suffer from one or more of the following shortcomings:
they do not detect all memory errors,
they generate many false positive warnings,
and/or they do not scale to large programs.
The focus of our work is the
development of techniques that require no additional effort on the
part of programmers, and hence can be applied to the vast base of
existing software, in binary form, with no programmer effort.
Hybrid approaches perform runtime memory-error checking, but also
use static analysis to minimize the number of checks. CCured [28] and Cyclone [21] are two
recent examples of this approach. One difficulty with these
approaches is that they are not 100% compatible with existing
C-code. Moreover, they disable explicit freeing of memory, and
rely on garbage collection.
5.4 Code Obfuscation
Code obfuscation [38, 10, 4] is a program
transformation technique which attempts to convolute the low-level
semantics of programs without affecting the user-observable behavior,
making obfuscated programs difficult to understand, and thereby difficult
to reverse-engineer.
The key difference between program obfuscation and address obfuscation is
that program obfuscation is oriented towards preventing most static
analyses of a program, while address obfuscation has a more limited goal
of making it impossible to predict the relative or absolute addresses of
program code and data. Other analyses, including reverse compilation,
extraction of flow graphs, etc., are generally not affected by address
obfuscation.
5.5 Randomizing Code Transformations
As mentioned earlier, address obfuscation is an instance of the broader
idea of introducing diversity in nonfunctional aspects of software, an
idea suggested by Forrest, Somayaji, and Ackley [17]. Their
implementation model was called a randomizing compiler, which can
introduce randomness in several non-functional aspects of the compiled
code without affecting the language semantics. As a proof of concept, they
developed a modification to the gcc compiler to add a random amount of
padding to each stack allocation request. This transformation defeats most
stack-smashing attacks prevalent today, but does not work against the
large overflow attacks of the sort described in Section 3.
In the past year or two, several researchers [8, 1, 39, 15]
seem to have independently attempted to develop randomization as a
practical approach to defeat buffer-overflow and related attacks.
Work by
Chew and Song [8]
randomizes the base address of the stack, system call numbers,
and library entry points, through a combination of program loader modifications,
kernel system call table modifications, and binary rewriting.
Xu, Kalbarczyk, and Iyer developed transparent runtime randomization [39],
in which the Linux kernel is modified to randomize
the base address of stack, heap, dynamically loaded libraries, and
GOT.
The PaX project's address space layout randomization (ASLR) approach [1]
randomizes the base address of each program region: heap, code, stack,
data. Of these, the ASLR approach is the most advanced in terms of its
implementation. As noted earlier, ASLR is vulnerable to attacks that rely
on adjacency information such as the relative addresses between variables
or code, and attacks that can provide information about the base addresses
of different memory segments. The introduction of additional randomization
in address obfuscation, in the form of random-sized gaps within stack
frames and blocks allocated by malloc, reordering of (and random padding
within) code and static variables, can address these weaknesses. Another
important difference between the above works and ours is that our
obfuscations are implemented using program transformations, whereas the
other works are implemented using operating system modifications. For this
reason, our approach can be more easily ported to different operating
systems. Moreover, it can protect individual (security-critical)
applications without having to make any changes to the rest of the system.
The PointGuard [13] approach complements ours in that
it randomizes (``encrypts'') stored pointer values, as opposed to the locations
where objects are stored. The encryption is achieved by xor'ing
pointer values with a random integer mask generated at the beginning
of program execution. It shares many of the benefits (such as broad
protection against a wide range of pointer-related attacks) and weaknesses
(susceptibility to attacks that read victim process memory to identify
the mask). The principal differences are that (a) PointGuard does not
protect against attacks that do not involve pointer values, e.g.,
attacks that modify security-critical data through a buffer overflow,
and (b) probability of successful attacks is smaller with PointGuard
than with address obfuscation since the range of randomization can
be as large as the address space. It should also be noted that
PointGuard is dependent on the availability of accurate type information.
Many C-language features, such as the ability to operate on untyped buffers
(e.g., bzero or memcpy),
functions that take untyped parameters (e.g., printf), unions
that store pointers and integer values in the same location, can
make it difficult or impossible to get accurate type information, which
means that the corresponding pointer value(s) cannot be protected.