Many studies indicate that requests to static Web objects follow a Zipf-like popularity distribution [11,38]. The probability px of a request to the xth most popular object is proportional to , for some parameter . Many requests target the most popular objects, but the distribution has a heavy tail of unpopular objects with poor reference locality. Higher values increase the concentration of requests on the most popular objects. We assume that object size is independent of popularity [11], and that size distributions are stable for each service [35].
Given these assumptions, a utility OS can estimate hit ratio for a memory size M from two parameters: , and T, the total size of the service's data (consider M and T to be in units of objects). If the server effectively caches the most popular objects (i.e., assuming perfect Least Frequently Used or LFU replacement), and ignoring object updates, the predicted object hit ratio H is given by the portion of the Zipf-distributed probability mass that is concentrated in the M most popular objects. We can closely approximate this H by integrating over a continuous form of the Zipf probability distribution function [11,38]. The closed form solution reduces to:
(1) |
Zipf distributions appear to be common in a large number of settings, so this model is more generally applicable. While pure LFU replacement is difficult to implement, a large body of Web caching research has yielded replacement algorithms that approximate LFU; even poor schemes such as LRU are qualitatively similar in their behavior.