Check out the new USENIX Web site. next up previous
Next: 5.3 Identifying Valid Accesses Up: 5. Page Reconstruction Module Previous: 5.1 Building a Knowledge

   
5.2 Reconstruction of Web Page Accesses

With the help of the Knowledge Base, EtE monitor processes the entire Transaction Log again. This time, EtE monitor does not exclude the entries without referer fields. Using data structures similar to those introduced in Section 5.1, EtE monitor scans the sorted Transaction Log and creates a new Client Access Table to store all accesses as depicted in Figure 2. For each transaction, EtE monitor locates the Web Page Table for the client's IP in the Client Access Table. Then, EtE monitor handles the transaction depending on the content type:

1. If the content type is text/html, EtE monitor creates a new web page entry in the Web Page Table.

2. If a transaction is an independent, single page object, EtE monitor marks it as individual page without any embedded objects and allocates a new web page entry in the Web Page Table.

3. For other content types that can be embedded in a web page, EtE monitor attempts to insert it into the web page that contains it.

If none of the above policies can be applied, EtE monitor drops the request. Obviously, the above heuristics may introduce some mistakes. Thus, EtE monitor also adopts a configurable think time threshold to delimit web pages. If the time gap between the object and the tail of the web page that it tries to append to is larger than the threshold, EtE monitor skips the considered object. In this paper, we adopt a configurable think time threshold of 4 sec.


next up previous
Next: 5.3 Identifying Valid Accesses Up: 5. Page Reconstruction Module Previous: 5.1 Building a Knowledge