Next: Spatial locality
Up: New Analysis
Previous: New Analysis
In the section, we analyze whether users are interested in a similar set of
documents on different days. To answer this question, we pick the N most
popular documents from each day, and compare the extent of the overlap. Since
all the web pages are dynamically generated, a document is defined as a
combination of a unique URL name and the query parameters (i.e., two requests
with the same URL with different parameters are considered as different
document requests). We will use the terms document and query interchangeably
in this section.
First we study the requests from all users, i.e., including wireless, offline,
and desktop users. Figure 12 plots the overlap
between weekdays August 15 (Tuesday) and August 21 (Monday) versus other days
(i.e., both weekend days and weekdays) (In Figure 12, the curves with points are for pairs of weekdays, and those without points are for a weekday and weekend.)
Figure 13 plots the overlap between weekend
days. Note that the x-axis data value for the top N case does not always correspond
to exactly N in the graphs. The reason is that when we consider
the top (say) 100 documents, the next few documents after these documents may also
have the same frequency as the 100th document; since we include these documents
as well for the ``top 100'' data point, it sometimes results in a small mis-match of the
plotted points.
Figure 12:
Temporal stability of document ranking when we compare a weekday versus other days.
|
Figure 13:
Temporal stability of document ranking between weekend days
|
Looking at Figure 12, we make the following
observations: first, the overlap between different days is significant.
For example, the overlaps are over 80% for the top 100
documents, and mostly over 70% for the top 1000 documents. This indicates
that the set of popular queries remains relatively stable, and suggests that
we can cache a stable set of popular query results or optimize the data layout
to improve the performance of these queries. For example, workload-based
techniques can be used to generate indices and materialized views
automatically for a database [2]; these techniques are largely
applicable if the database query workload is relatively stable (which is the
case for our browser queries).
Second, the overlap initially fluctuates with the increasing number of
documents picked, and then decreases when the number of top documents picked
is over 100. The initial fluctuation is probably due to the fact that although
very popular documents tend to remain popular, their relative ranking does
change over time. However, as we further increase the number of documents, we
may include some less popular documents. Since these documents are less likely
to remain popular than very popular documents, the temporal overlap
decreases. This phenomenon was also observed in [16].
Third, the overlap between pairs of
weekdays is generally higher than the overlap between a weekend day and a
weekday. The overlap between two weekend days is even higher. This is
consistent with our intuition, and suggests that we should use past weekday
workload to predict future weekday workload, and likewise use past weekend
workload to predict future weekend workload.
We also examine the requests coming from only the wireless users, and find the
results are very similar. As before, the set of popular queries remains stable
over time. The stability is especially high when we consider the most popular
queries. In addition, there is a significant difference between the access
pattern on weekdays versus that on weekends.
Next: Spatial locality
Up: New Analysis
Previous: New Analysis
Lili Qiu
2002-04-17