Next: New Analysis
Up: Browser Log Analysis
Previous: Browser Log Analysis
Summary of previous analysis
In [1], we analyzed the browser log collected during the period from
August 15, 2000 through August 26, 2000. During this time the web server
received 1.6 - 3.2 million requests per day from 64,000 - 98,000 distinct clients.
Below is a synopsis of our major findings:
- 1.
- The distribution of document popularity does not closely follow
Zipf-like distribution, where a document is defined as a unique URL or
as a unique URL and parameter pair. The majority of requests are
concentrated on a small number of documents. In particular, we found that
0.1% - 0.5% of the documents (i.e., approximately 121 - 442) account
for 90% of the requests.
- 2.
- More than 60% of the pages accessed at the web server are
due to offline PDA users and less than 7% of the accesses are due to
wireless clients; the remaining accesses are due to desktop clients for
registration and customization services.
- 3.
- Our analysis for the distribution of reply sizes showed that most of
the replies to wireless clients are less than 3 KBytes. For offline
clients, most of the replies are less than 6 KBytes. The reply size
distribution for the two types of clients is similar.
- 4.
- Our user session analysis showed that users tend to have short sessions
when interacting with the web site: 95% of the sessions were less than
3 minutes. We empirically determined the session-activity threshold to
be somewhere between 30 to 45 seconds (i.e., if no request is received
from a client for such a duration, it implies that the old session has
ended).
- 5.
- Our category analysis showed that stock quotes, news, and yellow pages
are the top categories accessed by wireless clients. For offline clients,
help is the most popular category followed by news and stock quotes.
- 6.
- We observed that the relative importance of different categories did
not change between weekdays and weekends (except stock quotes and
sports). However, the amount of data accessed over the weekend drops by
approximately 45%.
These findings have the following performance implications:
- 1.
- The high concentration of requests to popular documents in the browser
log implies that caching the results of popular queries would be very
effective in reducing the web server load.
- 2.
- Since most replies sent to wireless and offline users are small (3 - 6
KB), the wireless web server should be highly optimized in sending short
replies, e.g., optimizing TCP slow start and re-start [15,23] can
be useful in this environment.
- 3.
- Our heuristic, based on user session analysis, to determine the
session-inactivity period can be useful to wireless service providers who
want to reclaim IP addresses. Our analysis showed that IP addresses may be
reclaimed more quickly than the time period determined in earlier work [12].
Next: New Analysis
Up: Browser Log Analysis
Previous: Browser Log Analysis
Lili Qiu
2002-04-17