This paper reports the effectiveness of a predictive file prefetching technique that operates automatically and without any sort of information supplied by applications or users. The technique, which is incorporated into the client's cache manager, makes extra requests to the server, hopefully in advance of the actual need for the data. Prefetched data is then placed in the client's cache.
We hypothesize that there is pronounced regularity in file access patterns and relatively simple algorithms can identify an access pattern and quickly spot it when it re-emerges later during another run of the application. In particular, we build semantic data structures, called access trees, that capture potentially useful information concerning the interrelationships and dependencies between files. An access tree for a program records all the files referenced during one execution of the program. For each program, we maintain a number of access trees in virtual memory that represent distinct file usage patterns. When a program is re-executed, we compare the access tree being formed by current activity against saved access trees to determine which usage pattern, if any, is recurring; we then prefetch the files remaining in the saved access tree.
File prefetching brings two major advantages. First, applications run faster because they hit more in the file cache. Second, there is less ``burst'' load placed on the network because prefetching is done only when there is network bandwidth available rather than on demand. On the other hand, there are two main costs of prefetching. The first is the CPU cycles expended by the client in determining when and what to prefetch. Cycles are spent both on overhead in gathering the information necessary to make prefetch decisions, and on actually carrying out the prefetch. The second cost is the network bandwidth and server capacity wasted when prefetch decisions inevitably prove less than perfect.
We have conducted our study in the UNIX environment, which is ubiquitous in academia and the research community. It has been well recognized that on UNIX most files are accessed in their entirety and sequentially [19, 2]. We take advantage of this phenomenon in two ways. First, we effectively model file system events at the level of whole files, and look for access patterns across files. Second, if a file is predicted, we prefetch the initial portion of the file only, letting the standard sequential readahead mechanism bring in the rest when the file is demanded.
Section 2 details the prefetching mechanism. Section 3 reports results from a trace-driven simulation. Section 4 describes the implementation and presents some initial performance data. Section 5 discusses related work.