Next: Types of Analyses
Up: Data Characteristics
Previous: Types of Accesses
We had access to logs for 12 days of web browsing from August 15, 2000 through
August 26, 2000. There were approximately 33 million entries in the browse
logs. Additionally, we used notification logs from August 20, 2000 through
August 26, 2000, which contained 3.25 million entries. For our analysis of the
correlation between browse and notification services
(Section 6), we obtained additional notification logs and
performed the comparison for the period from August 15, 2000 through August
26, 2000.
When a registered user sends a browse request to the web server, a unique
identifier corresponding to the user is sent to the server and logged in the
web traces (for unregistered users, the id field is empty). We use these
identifiers for performing the user-based analysis. Each log record also
contains other pieces of useful information along with the user ids, such as
the date, time, type of browser, the URL accessed, the data received and sent
by the server, etc.
When a notification message is sent, a record is logged in a database. We
obtained a part of this database for our analysis. The database entries
contained information about the server from where the notification message was
sent, a user id, type of the device to which the message was sent (e.g., phone
or pager), type of alert, when it was sent, etc.
To efficiently manipulate a large amount of data logs (over 10 GB), we
consolidated them into a commercial database system and created indices on
columns such as date, user id, and URL. To overcome the limited expressiveness
of our database language (in terms of string manipulation), we further
processed the database output using Perl scripts.
Next: Types of Analyses
Up: Data Characteristics
Previous: Types of Accesses
Lili Qiu
2002-04-17