Next: 5.3 Identifying Valid Accesses
Up: 5. Page Reconstruction Module
Previous: 5.1 Building a Knowledge
5.2 Reconstruction of Web Page Accesses
With the help of the Knowledge Base, EtE monitor processes the
entire Transaction Log again. This time, EtE monitor does not exclude
the entries without referer fields.
Using data structures similar to those introduced in Section
5.1, EtE monitor scans the sorted Transaction Log
and creates a new Client Access Table to store all accesses as
depicted in Figure 2.
For each transaction, EtE monitor locates the Web Page Table for the
client's IP in the Client Access Table. Then, EtE monitor handles
the transaction depending on the content type:
1. If the content type is text/html, EtE monitor creates a new
web page entry in the Web Page Table.
2. If a transaction is an independent, single page object, EtE monitor
marks it as individual page without any embedded objects and allocates
a new web page entry in the Web Page Table.
3. For other content types that can be embedded in a web page, EtE
monitor attempts to insert it into the web page that contains it.
- If the referer field is set for this transaction, EtE
monitor attempts to locate the referred page in the following
way. If the referred HTML file is in an existing page entry in
the Web Page Table, EtE monitor appends the object at the end of
the entry. If the referred HTML file does not exist in the client's
Web Page Table, EtE monitor first creates a new web page entry
in the table for the referred page and marks it as nonexistent. Then it appends the object to this page. If the referer field is not set for this transaction, EtE monitor uses the
following policies. With the help of the Knowledge Base, EtE
monitor checks each page entry in the Web Page Table from
the latest to earliest. If the Knowledge Base contains the content template for the checked page and the considered object does
not belong to it, EtE monitor skips the entry and checks the next one
until a page containing the object is found. If such an entry is
found, EtE monitor appends the object to the end of the web page.
- If none of the web page entries in the Web Page Table
contains the object based on the Knowledge Base, EtE
monitor searches in the client's Web Page Table for a web page
accessed via the same flow ID as this object. If there is such a web
page, EtE monitor appends the object to the page.
- Otherwise, if there are any accessed web pages in the table,
EtE monitor appends the object to the latest accessed one.
If none of the above policies can be applied, EtE monitor
drops the request. Obviously, the above heuristics may
introduce some mistakes. Thus, EtE
monitor also adopts a configurable think time threshold to
delimit web pages. If the time gap between the object and the tail of
the web page that it tries to append to is larger than the threshold,
EtE monitor skips the considered object.
In this paper, we adopt a configurable think time threshold of 4 sec.
Next: 5.3 Identifying Valid Accesses
Up: 5. Page Reconstruction Module
Previous: 5.1 Building a Knowledge