We now weave together the above analysis and insights into a concrete, adaptive (self-tuning) algorithm that dynamically balances the cache space allocated to RANDOM and SEQ data under different conditions such as different number of sequential/random streams, different data layout strategies, different think times of sequential/random streams, different cache-sensitivities/footprints of random streams, and different I/O intensities. The complete algorithm is displayed in Figure 4. SARC is extremely simple to implement and has virtually no computational/space overhead over an LRU-based implementation.
In our experiments, we have set to be of the cache space. Any other small fraction would work as well.
In the algorithm, we will need to determine if a hit in one of the lists was actually in its bottom (see lines 6 and 12). Of course, one could maintain separate fixed sized bottom lists for both the lists. However, with an eye towards simplification and computational efficiency, we now describe a time-based approximation to achieve nearly the same effect. We now describe how to determine if the a hit in SEQ lies in its bottom . Let and denote the time stamp of the MRU and LRU tracks, respectively, in SEQ. Let denote the size of SEQ in pages. Let denote the time stamp of the hit track. Now, if
The algorithm uses the following constants: (lines 18 and 32) denotes the degree of synchronous and asynchronous read-ahead; (lines 18 and 32) denotes the number of disks in a RAID group; (line 49) is the offset from the end of a prefetched group of tracks and is used to demarcate the asynchronous trigger; LargeRatio (line 13) is a threshold, say, , to indicate that the ratio has become much larger than ; and FreeQThreshold (line 52) denotes the desired size of the FreeQ. Shark uses RAID arrays (either RAID-5 or RAID-10) at the back-end. Our machine (see Section IV-B) was configured with RAID-5 (6 data disks + parity disk + spare disk) meaning that . RAID allows for parallel reads since logical data is striped across the physical data disks. To leverage this, setting as a multiple of is meaningful. We set . Finally, we chose to be .
The algorithm is self-explanatory. We now briefly explain important portions of the algorithm in Figure 4 in a line-by-line fashion.
Lines 1-3 are used during the initialization phase only. The counter tracks the number of sequential misses between two consecutive bottom hits in RANDOM, and is initialized to zero. The variable is the desired size of SEQ, and is initially set to zero meaning adaptation is shut off. The adaptation will start only after SEQ is populated (see lines 69-73). The variable determines the instantaneous magnitude and direction of the adaptation to .
Lines 4-50 describe the cache management policy. The quantity in line 4 is derived from (6). Lines 5-10 deal with the case when a track in RANDOM is hit. If the hit is in the bottom portion of the RANDOM list (line 6), then is reset to zero (line 7) since we are interested in number of sequential misses between two successive hits in the bottom of RANDOM. Line 8, sets the variable
Lines 11-27 deal with the case when a track in SEQ is hit. If the hit is in the bottom portion of the SEQ list (line 12) and has become large (line 13), in other words, no hit has been observed in the bottom of the RANDOM list while comparatively large number of sequential misses have been seen on the SEQ list, then set to (line 14) meaning that increase at the fastest rate possible. Now, if the hit track is an asynchronous trigger track (line 17), then asynchronously read-ahead the next sequential group of tracks (line 18). Lines 21-27 describe how the sequential detection mechanism in Section II-A is implemented.
Lines 28-40 deal with a cache miss. For a sequential miss (lines 29-31), synchronously read-ahead a sequential group of tracks (line 32). The remaining lines deal with sequential detection mechanism in Section II-A.
Lines 41-50 (i) read the missing tracks from a given range of tracks; (ii) places all tracks in the given range at the MRU position; and (iii) set the asynchronous trigger.
Lines 51-73 implement the cache replacement policy and carry out the adaptation. As is typical in multi-threaded systems, we assume that these lines run on a separate thread (line 51). If the size of the free queue drops below some predetermined threshold (line 52), then tracks are evicted from SEQ if it exceeds and tracks are evicted from RANDOM otherwise. In either case, the evicted tracks are placed on the free queue. Observe that SARC becomes active (lines 60-64) only if the sizes of both the SEQ and the RANDOM list exceed . Otherwise, a simple LRU eviction (lines 54-58) is done. Whenever the utility of one of the two lists becomes so small when compared to the utility of the other list that its size eventually drops below , we do not want to waste even amount of cache, and revert back to LRU. Whenever both list sizes exceed , SARC takes over. Finally, lines 68-73 evict the LRU track from the desired list, and effect an adaption as already described above.
Our description of SARC is now complete.