Spatial locality

Next: Load distribution of different Up: New Analysis Previous: Temporal Stability

Spatial locality

In this section, we consider the following question: do people in the same geographical region tend to issue a similar set of queries. We employ the same approach as is used in studying the spatial locality for notification services (described in Section 4.3.1). Figure 14 compares the fraction of documents that are shared within a geographical cluster and within four random clusters, when we consider requests from all the users (excluding users with invalid IDs). The figure shows that the curve for the geographical clusters overlaps with those for random clusters. This overlap indicates that the degree of sharing between geographical clustering and random clustering is comparable, and the correlation between users' interest in browsing over wireless channels and their geographical location is weak.

**Figure 14:** Local sharing between random sets of clients and clients that are geographically close together.
$\begin{figure}\centerline{\psfig{figure=figures/browse-spatial-new.ps,width=1.6in,angle=-90}} \end{figure}$

A possible explanation for the weak correlation is that the popular browse content has global interest. In particular, as mentioned in Section 5.1, 0.1% - 0.5% of the URL and parameter combinations (i.e., about 121 - 442 unique combinations) account for 90% of the requests. With such a high concentration of user interest on a few documents, even when clients are picked at random, they share many requests; therefore, the geographical locality becomes insignificant. A similar phenomenon has been observed in a study of a popular news server [16], where the authors observed that the significance of domain membership becomes diminished during a popular event. A major distinction between that study and ours is the way in which users are clustered: in that study, users are clustered based on their DNS names, whereas in our study we cluster users based on their geographical region, e.g. the city in which they reside. A natural question follows - why is there such a high concentration of interest in popular documents that even when clients are picked at random they share many documents? Examination of the most popular URLs and parameters shows that they include the front pages for email login, news, sports, weather, lottery, and the signup application, as well as some popular stock quote queries. Intuitively, these queries are very popular to all users regardless of their physical locations. The lack of geographical locality implies that the web server's content can be replicated without keeping in mind the geographical location of the clients. We performed the same spatial locality analysis to requests issued only by wireless clients. Figure 15 summarizes the results. With geographical clustering, wireless clients have slightly more sharing of documents than with random clustering; however, the distinction between the two clusterings is much less significant than the difference observed for notification documents. This result suggests that using geographical locality of wireless users as input for optimizing performance (or providing content) will yield limited success.

**Figure 15:** Comparison of local sharing between random sets of wireless clients and wireless clients that are geographically close together.
$\begin{figure}\centerline{\psfig{figure=figures/browse-spatial-wireless.ps,width=1.6in,angle=-90}} \end{figure}$

Next: Load distribution of different Up: New Analysis Previous: Temporal Stability

Lili Qiu
2002-04-17