Check out the new USENIX Web site. next up previous
Next: Types of Analyses Up: Data Characteristics Previous: Types of Accesses

Description of Data Logs

We had access to logs for 12 days of web browsing from August 15, 2000 through August 26, 2000. There were approximately 33 million entries in the browse logs. Additionally, we used notification logs from August 20, 2000 through August 26, 2000, which contained 3.25 million entries. For our analysis of the correlation between browse and notification services (Section 6), we obtained additional notification logs and performed the comparison for the period from August 15, 2000 through August 26, 2000. When a registered user sends a browse request to the web server, a unique identifier corresponding to the user is sent to the server and logged in the web traces (for unregistered users, the id field is empty). We use these identifiers for performing the user-based analysis. Each log record also contains other pieces of useful information along with the user ids, such as the date, time, type of browser, the URL accessed, the data received and sent by the server, etc. When a notification message is sent, a record is logged in a database. We obtained a part of this database for our analysis. The database entries contained information about the server from where the notification message was sent, a user id, type of the device to which the message was sent (e.g., phone or pager), type of alert, when it was sent, etc. To efficiently manipulate a large amount of data logs (over 10 GB), we consolidated them into a commercial database system and created indices on columns such as date, user id, and URL. To overcome the limited expressiveness of our database language (in terms of string manipulation), we further processed the database output using Perl scripts.
next up previous
Next: Types of Analyses Up: Data Characteristics Previous: Types of Accesses
Lili Qiu
2002-04-17