Experiments were performed on two operating systems: Windows 2000 Advanced Server (build 2195) and RedHat Linux 6.1 with a Linux 2.3.51 kernel. One server, TUX, which does not run on a Linux 2.3.51 kernel, was run on a Linux 2.4.0 kernel instead. AFPA on Windows 2000, IIS, and SWC were run on Windows 2000. AFPA on Linux, kHTTPd, TUX, Zeus, and Apache were run on Linux. To quantify the benefit of serving responses in a software interrupt context, a version of AFPA that does not include this optimization and instead serves all responses using kernel threads was implemented.
All experiments were performed on the same server hardware: an IBM Netfinity 7000 M10 with four 450 Mhz Pentium II Xeon processors, 4 GB of RAM and four Alteon ACEnic gigabit Ethernet adapters. The server hardware has two 33 Mhz PCI buses (one 32 bit and one 64 bit). Each PCI bus had two gigabit Ethernet adapters. Distributing these adapters over the two PCI buses was necessary to maximize the bandwidth of the memory bus. For all experiments, only one of the server's four CPUs were used. The presence of three empty CPU sockets does not interfere with the uniprocessor experiments. Ten client machines were used to generate load. The clients were IBM Intellistation Z-Pro systems with two 450 Mhz Pentium II Xeon processors, 256 MB RAM), and a single Alteon ACEnic gigabit Ethernet adapter. The clients ran RedHat Linux 6.1 and were connected to the server via a pair of Alteon ACEswitch 180 gigabit Ethernet switches.
The Netfinity 7000 M10 supports up to 280 MB/s memory to memory bandwidth based on timing memcpy(). In practice, the tested Netfinity hardware is at most capable of 200 MB/s bandwidth from main memory to the PCI buses. Including TCP/IP headers, HTTP request, and HTTP response, the maximum possible SPECWeb96 result is 11,400 requests per second.
All experiments were run using 9000 byte (jumbo) Ethernet frames. We chose jumbo Ethernet frames rather than standard 1500 byte Ethernet frames since it allowed our SPECWeb96 results to be compared with officially published results [26]. Limited experiments using standard Ethernet frames did not reveal in any significant difference in the performance trends seen with 9000 byte frames.
We note the following limitations of our test methodology. All experiments were performed with the same limited number of client machines. Our results focus almost entirely on uniprocessor rather than multiprocessor servers. Experiments were performed solely with non-persistent connections. Our analysis is constrained to static content only. Finally, results are reported only for the Linux and Windows 2000 operating systems running on the same Intel processor.