Next: 6. Related Work Up: Circus Previous: 4. Evaluating Circus

Subsections

5. Experimental Results

#include fig10.html We compare the performance of the Circus prototype with an unmodified FreeBSD 4.5 ftpd implementation. The experiments investigate alternative client features, network conditions, file characteristics, and server configuration parameters. We find Circus to improve the server throughput and file download time when files are shared by multiple clients.

5.1 Client Link Capacity

A key challenge in the design of a download server is to adapt automatically to different client rates without manual tuning. The closer the transfer rates of two clients match, the easier it becomes to exploit the data sharing among them. As the difference increases, it becomes more difficult to share cached data effectively.

Figure 8 depicts the network throughput of an unmodified (sequential) and a Circus (out-of-order) server as clients with different rates download a single file of size 512MB. In typical ftpd implementations (including the one that we use here), each active download request spawns an extra server process with resident memory space of about 1MB. Consequently, we show only T1 measurements for up to 30-40% load, roughly corresponding to about 200 concurrent clients. Beyond this point memory paging interferes with the measurements.

In all the three cases of a single client link rate (a-c), the out-of-order network throughput increases proportionally with the system load. In particular, at 40% load, we expect to receive 51.2MByte/s throughput, which is roughly what we observe in cases (b) and (c). The measured throughput is somewhat lower (in case d) with clients of different rates on the same server, but still reaches 50MByte/s at 50% load. Quite remarkably, the sequential system only matches the out-of-order performance at 10% load in the four cases, and never exceeds 30MByte/s (on average) as the load increases.

Figure 9 shows disk throughput for the same experiment. With sequential transfers, the disk is highly utilized even at low loads, regardless of the client rates. In contrast, with out-of-order transfers (a-c) the disk throughput drops to the transfer rate of a single client. For example the disk throughput is about 1MByte/s with 10T transfers (b), an order of magnitude lower than the sequential case. When we mix clients of different capacities (d), this behavior holds at low loads with the disk throughput about 5.6 MByte/s. At higher loads, the proportion of non-sharing (independent) clients increases, raising the disk throughput accordingly. Figure 10 further verifies these observations. With out-of-order transfers, the download latencies remain roughly constant at different system loads, according to the client rates. But when sequential transfers are used, the download latency increases rapidly with the system load.

5.2 Transferred File Size

This section investigates how the file size affects the system performance. Figure 11(a) shows the server network throughput in a file size range between 256MB and 1GB. We observe that, with out-of-order transfers, the network throughput remains above 50MByte/s, consistent with the 40% offered load. Sequential transfers cause the network throughput to drop below 20MByte/s, approaching the disk throughput. As a result, download latency (not shown) increases dramatically for sequential transfers to several tens of minutes. For the out-of-order case all downloads complete within a few minutes at all the file sizes that we examined. #include fig11-12.html #include fig13.html

5.3 Multiple Files

Even though it is likely that only a few files will be in heavy demand by the clients, we investigate how the performance of the system is affected when the number of popular files increases. We consider 1 to 16 different files of 512 MB each, all stored on a single server disk, and requested with equal probability. The clients receive data over 10Mbit/s links, and the system is at 40% load. In Figure 12(a), we illustrate the network throughput of the server with sequential and out-of-order transfers respectively. In the out-of-order case, the measured throughput remains roughly 50 MByte/s with up to 8 files, and drops slightly to 48MB/s with 16 files. From Figure 12(b), the average disk throughput increases linearly with the number of files up to eight, and reaches 10MB/s at 16 files. This behavior is expected because the number of disk access streams increases with more active files, and the disk throughput begins to limit the system as it approaches 10MByte/s. With sequential transfers, the disk throughput always limits the system and performance only worsens as the number of files increases. #include fig14-15.html

5.4 Round-trip Delay and Packet Loss

Packet loss rate and propagation delay can vary significantly in a wide-area network depending on the physical span and the operating conditions of the network. We investigated the impact of such factors to file transfers by experimenting with round-trip times of about 1and 75 ms, and with packet loss rates about $0\%$ and 10%, respectively, using Dummynet. In Figure 13, we measure the download time and server miss ratio when transferring a 512MB file over T1 and T3 links from the same server. When packet loss of 10% and delay of 75ms are combined in out-of-order transfers, download time over T3 links increases by an order of magnitude approaching the level of sequential transfers. This ten-fold increase from the base case can be attributed to the mechanism used by the congestion avoidance algorithm to recover the congestion window at the sender.

Longer round-trip delays increase the recovery time and the wasted network bandwidth. This can be explained by the TCP operation: packet losses lead to triple duplicate acknowledgments (rather than timeouts), and the congestion window increases by at most one data segment every round-trip time [21]. Individual sequential transfers have low throughput due to the disk bottleneck, and are not affected further at low load. However, raising the system load from 10% to 30% doubles the time of T3 sequential transfers, while leaving the out-of-order transfer time almost unchanged. When combining delay and loss with out-of-order transfers, disk throughput drops because data retransmissions hit in the buffer cache. We don't observe similar effects for sequential transfers, which provides additional evidence about the poor disk access locality of this policy. #include fig16-17.html

5.5 Sensitivity to System Parameters

This section examines how sensitive the system behavior is to important configuration parameters. We did extensive experimentats to ensure that the system remains robust across a wide range of workloads, but we include only a few representative measurements here. Overall, the system behavior is affected by the configuration parameters below, but remains stable when the parameters remain within the ranges that we suggest.

5.5.0.1 Block Size.

The block size is a configurable parameter that specifies the unit of disk access and network transfer requests in the server. Its value affects the utilization of the devices, the overhead involved in the operation of the system, and the overall server throughput. In Figure 14, we illustrate the network throughput and miss ratio across different system loads for block sizes ranging between 4KB and 1MB. We observe that both the measured metrics remain constant with block size larger than 16KB and 64KB at low and high load, respectively. Low loads show higher miss ratios because there is less sharing. Smaller block sizes increase the disk access overhead and block selection overhead. In general, we found the block size equal to 64KB to perform well, and we used it in all the other experiments.

5.5.0.2 Queue Length Bound.

Figure 15 shows the effect of varying the bounds on the block queue FIFOs for active clients. Shorter queues make the system more adaptive to the variability of the client behavior, because the blocks sent to each client are chosen based on recent system conditions. On the other hand, large queue lengths can increase the throughput of the system by keeping each client's network path fully pipelined. We examine the impact of the queue length limit on the performance of the system using 512MB download requests over equiprobable 10T and T3 links. With longer queues the miss ratio increases, the disk bandwidth becomes a bottleneck, and the server network throughput drops. This is expected because longer queue lengths can lead to stale requests for blocks that have been evicted from the cache and incur extra disk activity. In all the other experiments, the queue length limit is set equal to 5.

5.5.0.3 Client Threshold.

The client threshold controls the creation of independent clients according to the percentage of file blocks the client has received. From our experiments we found the system to perform well with client threshold around 0.75. Lower client thresholds reduce data sharing, increase disk access activity and lead to longer download duration (Figure 16), while higher client thresholds make the system operation less stable especially with large number of clients.

5.5.0.4 Leapfront Distance.

The leapfront distance determines when a client is allowed to play the role of a frontrunner depending on how far ahead from the file cursor the client cursor has moved. For convenience, we introduce the leapfront factor as the ratio of the leapfront distance over the active length. In Figure 17, we notice that as the leapfront factor grows larger than 1, the network throughput drops and the miss ratio increases. Setting the leapfront distance equal to the active length gives good performance by allowing the active region to move smoothly forward; larger leapfront distances tend to reduce spatial locality among different clients and lead to lower throughput. The active length was set equal to 16MB throughout our study.

Next: 6. Related Work Up: Circus Previous: 4. Evaluating Circus