Check out the new USENIX Web site. next up previous
Next: 4.3 Variable Argument Functions Up: 4 Finding Format String Previous: 4.1 Leaf Polymorphism

4.2 Explicit Type Casts

The treatment of type casts in the program's source code is very important to the correct operation of our tool. In most cases, a pointer cast is used to implement generic functions, to emulate object subtyping, or to otherwise bypass the limitations of the C type system. Since a pointer cast usually preserves the semantic meaning of the data being pointed to, we want to preserve the taintedness of data through ordinary C typecasts. Consider the following program fragment:

  void *y;
  char *x = (char *) y;
If y is tainted, then x should also be tainted, even though their types do not otherwise match.

Casts to void * are particularly problematic because one can cast any type to a void *. For example, a programmer might write

  char **s, **t;
  void *v = (void *) s;
  t = (char **) v;
Here the type structure of v has two levels, while the type structure of s has three. Hence there is no direct correspondence between the qualifiers of the two types.

We solve this problem by ``collapsing'' qualifiers at a type cast. If we cast a type $ t$ to a type $ u$, then we match up the qualifiers level-by-level between $ t$ and $ u$ as deeply as possible. For example, when casting char *x to a void *, we add the constraints $ x \leq \hbox{\textit{cast}}$ and $ x\_p =
\hbox{\textit{cast\_p}}$, where cast is the name we use for the qualifiers on the void *. As soon the structures of types $ t$ and $ u$ diverge, we equate all the remaining qualifiers. For example, when casting a char **x to a void *, we add the constraints $ x \leq \hbox{\textit{cast}}$ and $ x\_p = x\_p\_p
= cast\_p$. Putting this together, in the above example if if either *s or **s is tainted, then *v becomes tainted. When v is cast to char **t, both *t and **t will become tainted.

We also allow the knowledgeable programmer to indicate that some program data has been validated and should consequently be considered untainted despite its origins. Such an annotation can be expressed in our system by writing an explicit cast to an untainted type. To enable this, we do not add any constraints in case of an explicit cast containing a qualifier. For example, in the following code

  void *y;
  char *x = (untainted char *) y;
the assignment does not taint x, regardless of the inferred taintedness of y.

This feature allows the security-aware developer to implement functions that parse an input string and filter out dangerous substrings without departing from our framework. We assume that the programmer will add such an annotation only after ensuring that the string is validated by some rigorous checking procedure. There is no way to verify this assumption automatically. However, our syntax is designed to make it easy to manually audit all such annotations, since they can typically be easily identified by simply greping the source code for the keyword untainted.

Collapsing the qualifiers at casts is conservative but sound for the most common casts in a program. There are two ways in which our implementation is currently unsound with respect to casts. First, we have found that if we collapse qualifiers on structure fields at type casts, the analysis generates too many false positives (too much becomes tainted). Thus in our implementation if one aggregate is cast to another, we ignore the cast and do not collapse type qualifiers.

Second, because we use a subtyping-based system, the qualifier-collapsing trick does not fully model casts from pointers to integers. Consider the following code:

  char *x, *y;
  int a, b;

  a = (int) x;     (1)
  b = a;           (2)
  y = (char *) b;  (3)
For line (1), we generate the constraints $ x\_p = x = a$. For line (2), we generate the constraint $ a \leq b$. And for line (3), we generate the constraints $ b = y\_p = y$. Notice that we have $ x\_p
\leq y\_p$ but we do not have $ y\_p \leq x\_p$, so our deductions are unsound.

We leave as future work the solution to these problems. We believe that the best solution will be to combine techniques that attempt to recover the semantic behavior of casts with conservative alias analysis for ill-behaved casts [12,36,37].


next up previous
Next: 4.3 Variable Argument Functions Up: 4 Finding Format String Previous: 4.1 Leaf Polymorphism
Umesh Shankar 2001-05-16