Every program can invoke other programs. In UNIX-style operating systems, this is usually realized by the executing program forking child processes, which in turn execute other programs. The programs may also open some data files. All the file references in an application can be formulated into a tree data structure, dubbed an access tree. Files (program files and data files) are the nodes, and an edge is drawn from parent A to child B if either (1) program A invokes program B or (2) program A opens data file B. The order of siblings reflects the chronology of file accesses.
Figure 1: Example Access Trees.
These graphs show the access trees generated from the events
described in Section 1.2.
Graph (a) shows the version before
compression. Graph (b) that after both vertical and horizontal compression,
assuming E is a shell program.
Figure 1(a) depicts the access tree for an application A that includes the following activities:
An access tree is subject to two kinds of compression. Vertical compression draws edges through UNIX shells, effectively cutting them out of the access tree. This is necessary because of the role shells play as command interpreters. Shells are invoked in a variety of circumstances and can generate a large number of file usage patterns. Since we perform file prefetching for each program file in the access tree and it is infeasible to prefetch accurately for shells, we choose to ignore them. Horizontal compression removes consecutive accesses of the same data file. The detail of consecutive accesses offers no help in prefetching and can be safely omitted. However, non-consecutive accesses of the same data file are preserved for use in a later phase. Assuming program E is a shell in the previous example, Figure 1(b) shows the access tree after compressions.
An access tree describes the context in which file references are incurred in an application. We feel that this context information, if made available to the prefetching mechanism, would decrease the apparent randomness of file system events and permit more accurate predictions of future file references.
As an application proceeds, we construct an access tree for it by intercepting fork, execve, open, chdir and exit system calls. Execve and open calls deal with references of program files and data files respectively. Fork provides information on how processes, or program executions, are associated with each other. Information on chdir calls is used to resolve relative pathnames of files. The access tree is completed upon exit of the program execution. An access tree that is being constructed by current activity is called a working tree; a finished access tree that is saved to exemplify a file usage pattern is called a pattern tree.