Many Internet services are based on whole-file transfers or file downloads. For our purposes, the file download primitive has three defining characteristics: (i) each client initiates transfer of an entire file, rather than a block or fragment, (ii) the client postpones interpretation of the data until the transfer completes, and (iii) the transfer may occur as rapidly as permitted by the server, the client, and the network resources. Whole-file transfer is a building block of peer-to-peer (P2P) file sharing, the Web, media services, and data grids [2].
Conventional file download protocols such as FTP or HTTP transfer each file as an ordered stream of bytes, and use the underlying transport protocol (e.g., TCP) to deliver data in sequence. Nevertheless, the semantics of file download permit the server to deliver data to each client in any order, as long as the client eventually receives the entire file. Reordering relies on the client to reassemble the content, similar to P2P ``content swarming'' systems that disperse file blocks across multiple servers (such as BitTorrent [14]).
This paper investigates opportunistic block reordering as a technique to improve the cache effectiveness of a content server. More effective caching can increase the server throughput for a given disk subsystem, and decrease the disk and memory cost needed to fill the server network link. Block reordering complements well-known techniques to improve cache effectiveness with improved replacement algorithms (e.g., [20]) or integrated caching and prefetching [10]. Most obviously, it changes the block access sequence to suit the needs of the storage system, instead of just managing the storage cache to suit a specific block access sequence. In essence, block reordering extends the server memory by using the client memory as a reassembly buffer.
Block reordering is helpful primarily when multiple clients request large shared files concurrently. Workload studies have shown this case to be quite common and important for overall performance. Typically a modest percentage of popular files receive most of the requests in a content server, and larger files account for a disproportionally large share of the data traffic [5,7,11,3,27]. A recent workload characterization of a popular P2P system concluded that 42% of the data requested from a typical academic site involved transfers of a few hundred large objects with an average size of several hundred megabytes each [27]. In that study, large object caching on its own could yield a byte hit ratio as high as 38%. Block reordering is especially promising in content networks that employ content-based request routing to concentrate the requests for each file on a small set of servers; recent studies show large improvements in server cache locality from this technique [22,30,16].
When files are small it is sufficient to cache them in their entirety to capture most of the benefit from sharing. But caching of large files is less effective because they consume more cache space and therefore they are more vulnerable to eviction before the next request can generate a cache hit. As file size increases relative to the server memory, the hit ratios degrade and the disk access transaction rate limits server throughput [6,27]. Unfortunately, ongoing advances in disk bandwidth--due to increasing areal density and faster rotation speeds--do not help appreciably. In fact, seek overheads dominate because concurrent requests from a large number of clients tend to destroy the inherent sequentiality of block accesses to each file, producing a block access sequence that is effectively random. Since seek times improve more slowly than disk bandwidth, faster disks actually make the problem worse because these seek overheads consume a larger share of the arm time.
Several other techniques are directed at serving large shared content files, primarily for continuous media. One solution is to abandon caching and use large disk transfers to reduce seeking, a technique that is well-studied for streaming video servers (e.g., [4]). But buffer demands increase linearly with transfer size and the number of clients. A second alternative is to encode the shared files using forward error correcting codes (FEC) [9,26], e.g., the Tornado codes used by Digital Fountain. FEC yields optimal caching effectiveness, since every receiver can benefit from each block fetched from the disk. Unfortunately, FEC increases the volume of data needed to store and transmit a stream by as much as an order of magnitude, depending on the skew of client transfer rates. Other techniques include stream merging methods [17,18]. Section 6 discusses related work in more detail.
The rest of this paper is organized as follows. Section 2 uses a simple model to explore the performance behavior of file download servers and motivate opportunistic block reordering. Section 3 proposes a simple parameterized block selection algorithm for reordering. Section 4 describes its implementation in the Circus content server, and Section 5 presents experimental results. Section 6 sets opportunistic block reordering in context with previous work, and Section 7 summarizes our conclusions.