We have presented a technique for transparent on-line file prefetching. The technique analytically models interesting system calls and builds semantic structures that capture the intrinsic correlations between file references. It makes accurate predictions of future file accesses, imposes little CPU overhead, defers to demand I/O, and delivers substantially lower client cache miss rates and elapsed time for I/O-intensive applications.
One central trait of the algorithm is that it spends client CPU cycles in return for more effective use of client cache space and fewer on-demand network operations. Another distinguishing aspect is that the algorithm's lookahead ability is potentially much greater than that of previous work. Both of these traits help to couple application I/O performance more closely to CPU speed than to I/O device speed, thereby addressing a fundamental and longstanding problem in operating systems .
Our initial performance evaluation has been encouraging. We intend to extend the evaluation along several directions. First, we would like to run experiments that model user behavior more realistically. We are in the process of synthesizing workloads based on actual file system traces. Second, we plan to conduct experiments over a much wider spectrum of network and client capacities. We hope to reach a good understanding on when prefetching is feasible. Third, we would like to examine the applicability of our prefetching algorithm to different kinds of file systems: local file systems, remote file systems with caching of file blocks in main memory only, and remote file systems with whole file caching on client disks.