We now take a closer look at the compatibility function. This function, essentially a similarity metric between access trees, abstracts out the complexities of prefetch analysis and pattern tree maintenance. We illustrate the definition with the example in Figure 2. A pattern tree is shown in 2(a).
Figure 2: Computation of Compatibility. Graph (a) shows a pattern tree. Graph (b) shows a finished working tree, bearing a 0.575 compatibility with the given pattern tree. Graph (c) shows a unfinished working tree, so far bearing a 0.5 compatibility with the pattern tree; E is the pivot in the pattern tree, since it corresponds to the latest child node in the working tree being formed.
Let us first consider the case that the working tree is a finished one. We try to pair up the immediate child nodes in the working tree with identical child nodes in the pattern tree, preserving the order of the nodes. Recall that a child node can represent either an executable file, which may root another access tree, or a data file. For the two trees to be considered to describe the same file access pattern, we require that there be a one-to-one correspondence between all the executable files, but not all the data files. However, the same executable file may root a different access subtree in the working tree than in the pattern tree. We define as the percentage of data files that can be paired up, as the percentage of pairs of executables that root the same access subtree. Intuitively, suggests how ``compatible'' the data files are, suggests how ``compatible'' the executable files are. We choose to use the average of the two values as the compatibility of the two trees in question.
Let us consider the finished working tree in Figure 2(b). There are three data files in the working tree and two in the pattern tree. Out of these five files, only the two appearances of E can be paired up, making 0.4. Three out of four executable pairs (B, F and G) root identical access subtrees, so is 0.75. The compatibility is thus 0.575.
The case of an unfinished working tree is similar except that one child in the pattern tree is first determined to be the pivot node. The pivot corresponds to the most recently added child node in the working tree. Only the child nodes in the pattern tree that appear before the pivot are involved in compatibility computation. If the pattern tree is selected, we prefetch those files that follow the pivot in the sequence of pattern tree preordering, since those before the pivot probably have already been accessed. Given the unfinished working tree in Figure 2(c), node E is the pivot in the pattern tree. and are both 0.5, giving rise to a 0.5 compatibility. If we decide to prefetch this pattern tree, only the files in subtrees and will be prefetched.
Since the compatibility function is invoked often, it is important that it not be expensive. That is why we examine only immediate child nodes. The time complexity of the computation is proportional to the number of child nodes.