For operating system intensive applications, the ability of designers
to understand system call performance behavior is essential to
achieving high performance. Conventional performance tools, such as
monitoring tools and profilers, collect and present their information
off-line or via out-of-band channels. We believe that making this
information first-class and exposing it to applications via
in-band channels on a per-call basis presents opportunities for
performance analysis and tuning not available via other mechanisms.
Furthermore, our approach provides direct feedback to applications on
time spent in the kernel, resource contention, and time spent blocked,
allowing them to immediately observe how their actions affect kernel
behavior. Not only does this approach provide greater
transparency into the workings of the kernel, but it also allows
applications to control how performance information is collected,
filtered, and correlated with application-level events.
To demonstrate the power of this approach, we show that our
implementation, DeBox, obtains precise information about OS behavior
at low cost, and that it can be used in debugging and tuning
application performance on complex workloads. In particular, we focus
on the industry-standard SpecWeb99 benchmark running on the Flash Web
Server. Using DeBox, we are able to diagnose a series of problematic
interactions between the server and the OS. Addressing these issues as
well as other optimization opportunities generates an overall factor
of four improvement in our SpecWeb99 score, throughput gains on other
benchmarks, and latency reductions ranging from a factor of 4 to 47.