Check out the new USENIX Web site. next up previous
Next: Discussion Up: Evaluation Previous: Security evaluation


Performance


Table 1: The number and size of all run-time memory allocations (smalloc, scalloc, srealloc) performed by our test applications during a brief test run and the percentage of these allocations that handled sensitive data. We only count allocations done by the applications and not by any libraries that they use.
  GNOME Calendar J-Pilot OpenSSH client
  number of percent number of percent number of percent
Size (bytes) requests sensitive requests sensitive requests sensitive
0 - 1023 4216 86.9% 5914 26.7% 2073 97.6%
1024 - 2047 9 77.5% 0 - 46 100%
2048 - 3071 1 0% 1 100% 67 100%
3072 - 4095 7 85.7% 1 100% 3 100%
4096 - 5119 2 50% 2 100% 50 100%
5120 - 6143 1 100% 0 - 3 100%
6144 - 7167 0 - 0 - 3 100%
7168 - 8191 7 100% 0 - 2 100%
8192 - 9215 0 - 0 - 1 100%
9216+ 9 100% 0 - 12 100%
Total 4252 86.9% 5918 26.8% 2260 97.8%



Table 2: Time needed for scp to transfer 100 megabytes of data to a server on the same machine. The results demonstrate that using a shadow stack gives much better performance than storing sensitive locals in the heap.
Sensitive Elapsed Increase
locals Time(s) over
moved to:   baseline
Heap 27.24 33%
Shadow stack 21.71 6%
Baseline 20.51 -



Table 3: Results of running the greatest common divisor (GCD) microbenchmark. We computed 50 million GCD computations on integers. The other two implementations place the sensitive local variables on the heap and shadow stack. The first column indicates the number of heap allocations that the microbenchmark makes. The results demonstrate that using a shadow stack gives much better performance than storing sensitive locals in the heap.
Sensitive Heap Elapsed Increase
locals allocs Time(s) over
moved to:     baseline
Heap 657393865 242.64 373%
Shadow stack 2 62.42 22%
Baseline 1 51.28 -


Finding privacy-relevant and performance-critical applications proved to be a rather tricky exercise for us. Many of the applications for which one would be concerned about leaks of personal data were interactive: editors, browsers, information management tools, or remote access programs like ssh. In the course of testing, we found that the Scrash transformations did not reduce the responsiveness of the interactive applications we tested. In an attempt to better quantify the performance impact of Scrash, we ran two tests: one real application and a micro-benchmark to illustrate worst case behavior.

To test Scrash against a privacy-sensitive program that also has performance requirements, we chose to transfer large files using scp. This program has many of the same privacy vulnerabilities as ssh, as it in fact calls the ssh executable. The program is non-interactive, allowing us to measure changes in performance easily. For the tests, we transferred a 100-megabyte file to a server on the same machine to eliminate network-induced performance variability.

The second test is a microbenchmark that exercises the call stack heavily. We wrote this recursion-intensive benchmark to expose the worst case performance of Scrash, since every function entry and exit requires intervention from Scrash. Furthermore, all locals are declared to be in the sensitive stack, increasing the normal memory access times. The microbenchmark computes the greatest common divisor of 50 million pairs of random numbers, where each number is between 1 and 10 million. The benchmark uses Euclid's algorithm, which admits a natural recursive implementation. Each invocation performs very little computation: a modulus call, two comparisons and assignments, and then a recursive call. Due to the heavy use of recursion, the amount of stack maintenance overhead that this microbenchmark incurs is significantly greater than that of a typical application. We used the same random seed when testing all implementations, processing 657,393,863 function calls per test run. To exercise the Scrash transformations, we marked all local variables as sensitive, as well as one global variable that we used to track the number of function calls.

We performed the above tests on a 1.5 GHz Pentium 4 with 1 gigabyte of RAM, running a Linux 2.4.18 kernel with gcc 2.95. All tests were run with optimizations turned on at -O3. The tests were conducted under three different configurations: without Scrash (the baseline), using Scrash to place all sensitive local variables on the heap, and finally using the shadow stack to hold the sensitive local variables. The results, shown in Tables 2 and 3, are based on the averages of three separate test runs per configuration. See Section 3.4.2 for a description of the two Scrash configurations.

Our initial strategy of moving sensitive stack variables to the heap via a call to smalloc at the beginning of each applicable function, as described in Section 3.4.2, resulted in a large performance penalty of 33% overhead for ssh and 373% for the microbenchmark. The microbenchmark suffers a larger overhead because it performs over 600 million allocation and free calls - one for every procedure entry. It also incurs a much higher percentage increase over the baseline because function entry time is a larger percentage of the CPU time for the microbenchmark than for ssh.

The second strategy, using Scrash transformations to implement a shadow stack, added much less overhead: 22% for the GCD microbenchmark and 6% overhead for ssh - a moderate overhead for the realistic application scenario. This overhead is a result of maintaining the shadow stack pointer at the beginning and end of the function, as well as the extra level of indirection required to access local variables.

We see that the shadow stack gives much better performance than placing sensitive locals on the heap. Consequently, we enable the shadow stack by default.

We conclude that Scrash adds only a minimal performance overhead to real applications.


next up previous
Next: Discussion Up: Evaluation Previous: Security evaluation
Naveen Sastry 2003-05-12