Controlled Experiments

Next: Discussion Up: Implementation Previous: Structure

Controlled Experiments

We started the evaluation of the implementation with two simple, controlled experiments. The following questions motivated our experiments:

How much are the potential benefits of prefetching?
Under antagonistic circumstances, what is the bearing of prefetching on performance?
What is the CPU overhead due to prefetching?

The first experiment is a shell script that consists of tens of filter programs, each of which reads one parametric input file, performs some transformation, and writes one output file. Many well known UNIX programs are filters; included in our script are: awk, compress, sed, sort, strings, uniq, uuencode, and members of the grep family. All input files are the same size and reside remotely.

table233

Table 3: Filters Experiment. This table summarizes the performance results of the filters experiment. Part (a) presents the results with the 10 Mb/sec wired link; part (b) with the 2 Mb/sec wireless link.

The second experiment is composed of several program builds. Although builds may seem ideally suited to prefetching, they are antagonistic, for two reasons. First, on our hardware platform a build is CPU intensive, leaving the prefetch mechanism relatively little excess CPU cycles with which to make prefetch decisions. Second, compilations often access header files rapid-fire, meaning that the time between a prefetch decision and the actual need for the data may not be sufficient to complete the prefetch I/O.

The experiments were run standalone. Both the client and the server are 486 processors with 16MB of memory. The client dedicates 1.57MB to its UNIX buffer cache. Since we are interested to find out how our mechanism behaves in relation to different network bandwidth, we ran the tests using both hardwired and wireless links directly connecting the client and server. The hardwired connection is an Ethernet (10 Mb/sec), while the wireless link is an NCR ``WaveLAN'' [26] radio link with a maximum 2 Mb/sec data rate. For each combination of experiment and network link, we ran the tests both with and without prefetching. Each number reported below is the mean of three trials.

Figure 7: Filters Experiment: Comparison of Application Latency. These graphs illustrate the latency data in Table 3. Part (a) compares latency in the wired setting; part (b) in the wireless setting. Prefetching results in significant speedup in this experiment.

Table 3 shows the results when the filters experiment is run with a wired link. We varied the work load by using different input file sizes, as given in the ``File Size'' column. Size is measured in multiples of the server's preferred NFS block size, which is 4KB. For each work load, we list the UNIX buffer cache miss rate as well as the directory name lookup cache miss rate. Since our mechanism deals exclusively with network files, the cache miss rates are those of remote entries. The reduction of the miss rates due to prefetching is substantial. We also present the measurements of application latency, or total elapsed time, which we couldn't do in the simulation. From a user's standpoint, latency is the most important performance metric. The time measurements are in seconds, with standard deviations included in parentheses. The ``Speedup'' column is the ratio of no-prefetching latency to prefetching latency. The speedups are significant, particularly when the input files are relatively small, since the delay caused by a sequential file access mainly lies in the first one or two block reads. The remaining blocks, if any, will be brought into cache by NFS readahead logic before they are needed.

Table 3(b) shows the results when the same experiment is run over a wireless link. The application latency is larger because of the slower link, but the speedups are comparable to those in a wired setup. The cache miss rates with prefetching are slightly higher over a wireless link than over a wired link since a small amount of prefetch operations cannot be completed in time for demanded use. Nevertheless, the bandwidth of the wireless link is adequate to perform NFS readahead on a timely basis, so it still suffices for the prefetcher to read only the first file block for each prediction it makes.

Figure 7 graphically illustrates the application latency of the filters experiment.

table300

Table 4: Builds Experiment. This table summarizes the performance results of the builds experiment. Part (a) presents the results with the 10 Mb/sec wired link; part (b) with the 2 Mb/sec wireless link.

Figure 8: Builds Experiment: Comparison of Application Latency. These graphs illustrate the latency data in Table 4. Part (a) compares latency in the wired setting; part (b) in the wireless setting. There are no observable negative effects in this antagonistic experiment.

Our second experiment consists of builds of several UNIX utility programs. Table 4 contains the results of these tests. The application latency is also illustrated in Figure 8. When a large number of header files are opened in quick succession -- common during compilation -- the prefetcher often does not have enough time or available CPU cycles to make a fruitful prefetch even though it can speculate accurately which files will soon be referenced. The relatively small name cache miss rate improvements confirm this. However, prefetching still manages to reduce the buffer cache miss rate by 65% to 85%. There is no significant latency enhancement, suggesting that the total file read time is not dominant in the application latency. We are pleased to see that no negative effects are observed in an adverse case such as this.

table365

Table 5: CPU Time Consumption. This table presents the CPU time with and without prefetching, over two different network links. Also given is the ratio of CPU time to application latency. Part (a) give the results for the filters experiment; part (b) for the builds experiment.

We have used simulation traces to demonstrate the prefetch overhead in terms of wasted network bandwidth and server capacity. The implementation enables us to collect data on another type of cost, extra CPU consumption. Table 5 presents the results for the two experiments. The CPU time includes the system time and user time. Again the time measurements are in seconds and the standard deviations are given in parentheses. The CPU time overhead is negligible in all cases. We also show the ratio of CPU time to application latency (CPU/latency). It comes as no surprise that prefetching increases this ratio, substantially in the first experiment. A prefetcher reduces the total elapsed time by increasing the parallelism between CPU processing and I/O.

Next: Discussion Up: Implementation Previous: Structure

An Analytical Approach to File Prefetching
Hui Lei and Dan Duchamp