Many studies indicate that
requests to static Web objects follow a Zipf-like popularity
distribution [11,38].
The probability px of a request to the xth most popular object is proportional
to
,
for some parameter
.
Many
requests target the most popular objects,
but the distribution has a heavy tail of unpopular objects
with poor reference locality. Higher
values increase the
concentration of requests on the most popular objects. We
assume that object size is independent of popularity [11],
and that size distributions are stable for each service [35].
Given these assumptions, a utility OS
can estimate hit ratio for a memory size M from two
parameters: ,
and T, the total
size of the service's data (consider M and T to be
in units of objects). If the server effectively
caches the most popular objects (i.e., assuming perfect
Least Frequently Used or LFU replacement), and ignoring object
updates, the predicted
object hit ratio H is given by the portion of the Zipf-distributed
probability mass that is concentrated in the M most popular
objects. We can closely approximate this H by integrating over a
continuous form of the Zipf probability distribution function [11,38].
The closed form solution reduces to:
![]() |
(1) |
Zipf distributions appear to be common in a large number of settings, so this model is more generally applicable. While pure LFU replacement is difficult to implement, a large body of Web caching research has yielded replacement algorithms that approximate LFU; even poor schemes such as LRU are qualitatively similar in their behavior.