Next: 5. Page Reconstruction Module Up: EtE: Passive End-to-End Internet Previous: 3. EtE Monitor Architecture

4. Request-Response Reconstruction Module

As described above, the Request-Response Reconstruction module reconstructs all observed TCP connections. The TCP connections are rebuilt from the Network Trace using client IP addresses, client port numbers, and request (response) TCP sequence numbers. Within the payload of the rebuilt TCP connections, HTTP transactions can be delimited as defined by the HTTP protocol. Meanwhile, the timestamps, sequence numbers and acknowledged sequence numbers for HTTP requests can be recorded for later matching with the corresponding HTTP responses.

When a client clicks a hypertext link to retrieve a particular web page, the browser first establishes a TCP connection with the web server by sending a SYN packet. If the server is ready to process the request, it accepts the connection by sending back a second SYN packet acknowledging the client's SYN ⁴. At this point, the client is ready to send HTTP requests to retrieve the HTML file and all embedded objects. For each request, we are concerned with the timestamps for the first byte and the last byte of the request since they delimit the request transfer time and the beginning of server processing. We are similarly concerned with the timestamps of the beginning and the end of the corresponding HTTP response.

EtE monitor detects aborted connections by observing either a RST packet sent by an HTTP client to explicitly indicate an aborted connection or by a FIN/ACK packet sent by the client where the acknowledged sequence number is less than the observed maximum sequence number sent from the server. After reconstructing the HTTP transactions (a request and the corresponding response), the monitor records the HTTP header lines in the Transaction Log and discards the actual body of the HTTP response.

Each entry in the log includes a number of fields: (1) a unique flow ID for the TCP connection, (2) the client's IP address, (3) the requested URL, (4) the content type, (5) the referer field, (6) the via field, (7) whether the request was aborted, (8)the number of packets resent during the connection (potentially an indication of the presence of network congestion), (9) the size and timestamps of the request and response. Some fields in the entry are used to rebuild web pages, while other fields can be used to measure end-to-end performance.

An alternative way to collect most of the fields of the Transaction Log entry is to extend web server functionality. Apache, Netscape and IIS all have appropriate APIs. Most of the fields in the Transaction Log can be extracted via server instrumentation. This approach has some merits: 1) since a web server deals directly with request-response processing, the reconstruction of TCP connections becomes unnecessary; 2) it can handle encrypted connections.

However, the primary drawback of this approach is that web servers must be modified in an application specific manner. Our approach is independent of any particular server technology. On the other hand, instrumentation solutions cannot obtain network level information, such as the connection setup time and the resent packets, which can be observed by EtE monitor.

Next: 5. Page Reconstruction Module Up: EtE: Passive End-to-End Internet Previous: 3. EtE Monitor Architecture