Check out the new USENIX Web site. next up previous
Next: 5.1 Building a Knowledge Up: EtE: Passive End-to-End Internet Previous: 4. Request-Response Reconstruction Module

   
5. Page Reconstruction Module

To measure the client perceived end-to-end response time for retrieving a web page, one needs to identify the objects that are embedded in a particular web page and to measure the response time for the client requests retrieving these embedded objects from the web server. Although we can determine some embedded objects of a web page by parsing the HTML for the ``container object'', some embedded objects cannot be easily discovered through static parsing. For example, JavaScript is used in web pages to retrieve additional objects. Without executing the JavaScript, it may be difficult to discover the identity of such objects.

Automatically, determining the content of a page requires a technique to delimit individual page accesses. One recent study [6] uses an estimate of client think time as the delimiter between two pages. While this method is simple and useful, it may be inaccurate in some important cases. For example, consider the case where a client opens two web pages from one server at the same time. Here, the requests for the two different web pages interleave each other without any think time between them. Another case is when the interval between the requests for objects within one page may be too long to be distinguishable from think time (perhaps because of the network conditions).

Different from previous work, our methodology uses heuristics to determine the objects composing a web page, and applies statistics to adjust the results. EtE uses the HTTP referer field as a major ``clue'' to group objects into a web page. The referer field specifies the URL from which the requested URL was obtained. Thus, all requests for the embedded objects in an HTML file are recommended to set the referer fields to the URL of the HTML file. However, since the referer fields are set by client browsers, not all browsers set the fields. To solve this, EtE monitor first builds a Knowledge Base from those requests with referer fields, and uses more aggressive heuristics to group the requests without referer fields based on the Knowledge Base information.

Subsection 5.1 outlines Knowledge Base construction of web page objects. Subsection 5.2 presents the algorithm and technique to group the requests in web page accesses using Knowledge Base information and a set of additional heuristics. Subsection 5.3 introduces a statistical analysis to identify valid page access patterns and to filter out incorrectly constructed accesses.



 
next up previous
Next: 5.1 Building a Knowledge Up: EtE: Passive End-to-End Internet Previous: 4. Request-Response Reconstruction Module