Increasing demand for Web content and services has motivated techniques to grow Web server capacity. As a result, Web server performance has steadily improved and Web-hosting infrastructure has become more complex. Today's Web ``farms'' are multi-tiered and employ several types of specialized server systems dedicated to caching static content, applications, and databases. For example, network service providers use proxy caches to geographically distribute content on behalf of specific customers. This reduces bandwidth costs and improves end-user response times. Akamai [1], for instance, has built a commercial service for caching portions of their customer's static content using geographically distributed edge caches. In addition to caching static content, Web servers are able to cache dynamic content such as price lists, stock quotes, or sports scores. Dynamic content often changes at a coarse enough frequency or requires publishing at intervals sufficient for caching. Commercial efforts [2] and research projects [3] have successfully exploited the ability to cache most forms of dynamic content at the front tier of a Web delivery architecture. While some forms of dynamic content have real-time publishing requirements and remain difficult to cache, it has been shown by [4] that the ratio between all forms of dynamic and static content has remained constant, defying the commonly-held belief that dynamic workloads will dominate over time.
Caching Web servers are well-suited to analyzing network server performance tradeoffs. These servers are simple to implement and measure with existing, unmodified static Web benchmarks. Caching Web servers reduce network server logic to parsing an HTTP GET request. It's possible to parse such a request with a few lines of C code. What remains is experimentation with thread models, scheduling mechanisms, and new operating system primitives to reduce memory copies. The simplicity of parsing HTTP GET requests reduces the complexity of the code required for kernel-mode caching Web servers. Furthermore, it's usually possible to implement kernel-mode caching Web servers without modifying the operating system kernel, providing useful control cases for assessing user-mode optimizations.
This paper analyzes the performance gap between the fastest currently available user-mode caching Web servers and their kernel-mode counterparts while holding the operating system and hardware fixed. The goal of this analysis is to identify the potential performance gains possible for future user-mode primitives. User-mode servers employing current ``best practices'' are used to establish a baseline for fastest possible user-mode performance. The tested user-mode servers employ several techniques to minimize data copies and reduce overhead of network event notification. The paper measures several kernel-mode Web servers, including servers based on a platform called ``Adaptive Fast Path Architecture'' (AFPA). The experiments are repeated for two different operating systems: Linux 2.3.51 and Windows 2000. In all cases, the CPU hardware, TCP/IP stack, network hardware and operating system are held constant.
The paper compares the performance of several user-mode and kernel-mode caching Web servers using different workloads. The results show a wide performance margin between the better performing user-mode and kernel-mode servers. The user-mode servers are shown effective in reducing memory copies and reads while also reducing scheduling overhead with efficient event notification mechanisms and single thread, asynchronous I/O implementations. The best of these efforts are still two to six times slower than the fastest achieved kernel-mode performance on the unmodified Linux and Windows 2000 operating systems tested using the same hardware. The results reveal significant potential to improve user-mode server performance.
The paper is organized as follows: Section 2 classifies Web server performance issues, describes current user-mode and kernel-mode approaches, and describes related work. Section 3 describes the Adaptive Fast Path Architecture, a platform for building kernel-mode network servers. Section 4 describes the methodology used to measure and analyze user-mode and kernel-mode Web server implementations. Section 5 reports and analyzes the performance results for representative user-mode and kernel-mode Web servers. Finally, Section 6 draws conclusions from the performance analysis, and Section 7 makes recommendations regarding Web server design, and describes future work.