In this paper, we have examined server latency and traced the root of much of the problem to head-of-line blocking within filesystem-related kernel queues. This behavior may have little impact on throughput, but severely degrades latency and service quality. By examining individual request latencies, we find that this blocking gives rise to a phenomenon we call service inversion, where requests are served unfairly.
By addressing the blocking issues both with the Apache and the Flash server, we improve latency by more than an order of magnitude, and demonstrate a qualitatively different change in the latency profiles. We performed these changes in user space, in a portable manner, without requiring any modification to the kernel or filesystem layout. Without much effort or extensive modification, we were able to take advantage of these changes in a widely-deployed legacy server. The resulting servers also exhibit lower burstiness, and more fair request handling. Their latency values scale better with improvements in processor speed than their original counterparts, making them better candidates for future improvements. Finally, our results suggest that most server-induced latency is tied to blocking effects, rather than queuing.
In addition to the practical benefits of this research, the delivery of servers with much better latency properties, this work also improves on our fundamental understanding of the interactions between the filesystem, application, and workloads. By addressing the root causes of latency increase in network servers, we believe that we can enhance research in other areas, such as improving quality of service or scheduler policies.