We investigate the origin and components of network server latency
under various loads and find that filesystem-related kernel queues
exhibit head-of-line blocking, which leads to bursty
behavior in event delivery and process scheduling. In turn, these
problems degrade the existing fairness and scheduling policies in the
operating system, causing requests
that could have been served in memory, with low latency, to
unnecessarily wait on disk-bound requests. While this batching
behavior only mildly affects throughput, it severely degrades latency.
This problem manifests itself in fairness and
service quality degradation, a phenomenon we call service
inversion.
We show a portable solution that avoids these problems without kernel or
filesystem modifications, We modify two different Web servers to use
this approach, and demonstrate a qualitatively different change in
their latency profiles, generating more than an order of magnitude
reduction in latency. The resulting
systems are able to serve most requests without being tied to disk
performance, and they scale better with improvements in processor
speed. These results are not dependent on server software
architecture, and can be profitably applied to experimental and
production servers.