Check out the new USENIX Web site. next up previous
Next: New Analysis Up: Browser Log Analysis Previous: Browser Log Analysis

   
Summary of previous analysis

In [1], we analyzed the browser log collected during the period from August 15, 2000 through August 26, 2000. During this time the web server received 1.6 - 3.2 million requests per day from 64,000 - 98,000 distinct clients. Below is a synopsis of our major findings:
1.
The distribution of document popularity does not closely follow Zipf-like distribution, where a document is defined as a unique URL or as a unique URL and parameter pair. The majority of requests are concentrated on a small number of documents. In particular, we found that 0.1% - 0.5% of the documents (i.e., approximately 121 - 442) account for 90% of the requests.
2.
More than 60% of the pages accessed at the web server are due to offline PDA users and less than 7% of the accesses are due to wireless clients; the remaining accesses are due to desktop clients for registration and customization services.
3.
Our analysis for the distribution of reply sizes showed that most of the replies to wireless clients are less than 3 KBytes. For offline clients, most of the replies are less than 6 KBytes. The reply size distribution for the two types of clients is similar.
4.
Our user session analysis showed that users tend to have short sessions when interacting with the web site: 95% of the sessions were less than 3 minutes. We empirically determined the session-activity threshold to be somewhere between 30 to 45 seconds (i.e., if no request is received from a client for such a duration, it implies that the old session has ended).
5.
Our category analysis showed that stock quotes, news, and yellow pages are the top categories accessed by wireless clients. For offline clients, help is the most popular category followed by news and stock quotes.
6.
We observed that the relative importance of different categories did not change between weekdays and weekends (except stock quotes and sports). However, the amount of data accessed over the weekend drops by approximately 45%.
These findings have the following performance implications:
1.
The high concentration of requests to popular documents in the browser log implies that caching the results of popular queries would be very effective in reducing the web server load.
2.
Since most replies sent to wireless and offline users are small (3 - 6 KB), the wireless web server should be highly optimized in sending short replies, e.g., optimizing TCP slow start and re-start [15,23] can be useful in this environment.
3.
Our heuristic, based on user session analysis, to determine the session-inactivity period can be useful to wireless service providers who want to reclaim IP addresses. Our analysis showed that IP addresses may be reclaimed more quickly than the time period determined in earlier work [12].

next up previous
Next: New Analysis Up: Browser Log Analysis Previous: Browser Log Analysis
Lili Qiu
2002-04-17