Temporal Stability

Next: Spatial locality Up: New Analysis Previous: New Analysis

Temporal Stability

In the section, we analyze whether users are interested in a similar set of documents on different days. To answer this question, we pick the N most popular documents from each day, and compare the extent of the overlap. Since all the web pages are dynamically generated, a document is defined as a combination of a unique URL name and the query parameters (i.e., two requests with the same URL with different parameters are considered as different document requests). We will use the terms document and query interchangeably in this section. First we study the requests from all users, i.e., including wireless, offline, and desktop users. Figure 12 plots the overlap between weekdays August 15 (Tuesday) and August 21 (Monday) versus other days (i.e., both weekend days and weekdays) (In Figure 12, the curves with points are for pairs of weekdays, and those without points are for a weekday and weekend.) Figure 13 plots the overlap between weekend days. Note that the x-axis data value for the top N case does not always correspond to exactly N in the graphs. The reason is that when we consider the top (say) 100 documents, the next few documents after these documents may also have the same frequency as the 100^th document; since we include these documents as well for the ``top 100'' data point, it sometimes results in a small mis-match of the plotted points.

**Figure 12:** Temporal stability of document ranking when we compare a weekday versus other days.
$\begin{figure} \centerline{\psfig{figure=figures/temporal-rank-new-Aug15-tgif.p... ...g{figure=figures/temporal-rank-new-Aug21-tgif.ps,width=2.4in}} \end{figure}$

**Figure 13:** Temporal stability of document ranking between weekend days
$\begin{figure} \centerline{\psfig{figure=figures/temporal-rank-new-weekend-tgif.ps,width=2.4in}} \end{figure}$

Looking at Figure 12, we make the following observations: first, the overlap between different days is significant. For example, the overlaps are over 80% for the top 100 documents, and mostly over 70% for the top 1000 documents. This indicates that the set of popular queries remains relatively stable, and suggests that we can cache a stable set of popular query results or optimize the data layout to improve the performance of these queries. For example, workload-based techniques can be used to generate indices and materialized views automatically for a database [2]; these techniques are largely applicable if the database query workload is relatively stable (which is the case for our browser queries). Second, the overlap initially fluctuates with the increasing number of documents picked, and then decreases when the number of top documents picked is over 100. The initial fluctuation is probably due to the fact that although very popular documents tend to remain popular, their relative ranking does change over time. However, as we further increase the number of documents, we may include some less popular documents. Since these documents are less likely to remain popular than very popular documents, the temporal overlap decreases. This phenomenon was also observed in [16]. Third, the overlap between pairs of weekdays is generally higher than the overlap between a weekend day and a weekday. The overlap between two weekend days is even higher. This is consistent with our intuition, and suggests that we should use past weekday workload to predict future weekday workload, and likewise use past weekend workload to predict future weekend workload. We also examine the requests coming from only the wireless users, and find the results are very similar. As before, the set of popular queries remains stable over time. The stability is especially high when we consider the most popular queries. In addition, there is a significant difference between the access pattern on weekdays versus that on weekends.

Next: Spatial locality Up: New Analysis Previous: New Analysis

Lili Qiu
2002-04-17