8. Validation Experiments

We performed two groups of experiments to validate the accuracy of EtE monitor performance measurements and its page access reconstruction power.

In the first experiment, we used two remote clients residing at Duke University and Michigan State University to issue a sequence of 40 requests to retrieve a designated web page from HPLabs external web site, which consists of an HTML file and 7 embedded images. The total page size is 175 Kbytes. To issue these requests, we use httperf[16], a tool which measures the connection setup time and the end-to-end time observed by the client for a full page download. At the same time, an EtE monitor measures the performance of HPLabs external web site. From EtE monitor measurements, we filter the statistics about the designated client accesses. Additionally, in EtE monitor, we compute the end-to-end time using two slightly different approaches from those discussed in Section 6.1:

Table 6: Experimental results validating the accuracy of EtE monitor performance measurements.

	httperf		EtE monitor
Client	Conn	Resp.	Conn	EtE time	EtE time
	Setup	time	Setup	( last byte)	( ACK)
Michigan	0.074	1.027	0.088	0.953	1.026
Duke	0.102	1.38	0.117	1.28	1.38

The connection setup time reported by EtE monitor is slightly higher (14-15 ms) than the actual setup time measured by httperf, since it includes the time to not only establish a TCP connection but also receive the first byte of a request. The EtE time (ACK) coincides with the actual measured response time observed by the client. The EtE time (last byte) is slightly lower than the actual response time by exactly a round trip delay (the connection setup time measured by httperf represents the round trip time for each client, accounting for 74-102 ms). These measurements correctly reflect our expectations for EtE monitor accuracy (see Section 6.1). Thus, we have some confidence that EtE monitor accurately approximates the actual response time observed by the client.

The second experiment was performed to evaluate the reconstruction power of EtE monitor. The EtE monitor with its two-pass heuristic method actively uses the referer field to reconstruct the page composition and to build a Knowledge Base about the web pages and objects composing them. This information is used during the second pass to more accurately group the requests into page accesses. The question to answer is: how dependent are the reconstruction results on the existence of referer field information. If the referer field is not set in most of the requests, how is the EtE monitor reconstruction process affected? How is the reconstruction process affected by accesses generated by proxies?

To answer these questions, we performed the following experiment. To reduce the incorrectness introduced by proxies, we first filtered the requests with via fields, which are issued by proxies, from the original Transaction Logs for the both sites. These requests constitute 24% of total requests for the HPL site and 1.1% of total requests for the Support site. We call these logs filtered logs. Further, we mask the referer fields of all transactions in the filtered logs to study the correctness of reconstruction. We call these modified logs masked logs, which do not contain any referer fields. We notice that the requests with referer fields constitute 56% of the total requests for the HPL site and 69% for the Support site in the filtered logs. Then, EtE monitor processes the filtered logs and masked logs. Table 7 summarizes the results of this experiment.

Table 7: Experimental results validating the accuracy of EtE monitor reconstruction process for HPL and Support sites.

Metrics	HPL url1	HPL url2	Support url1	Support url2
Reconstructed page accesses ( filtered logs)	36,402	17,562	17,601	11,310
EtE time ( filtered logs)	3.3 sec	4.1 sec	2.4 sec	3.3 sec
Reconstructed page accesses ( masked logs)	33,735	14,727	15,401	8,890
EtE time ( masked logs)	3.2 sec	4.1 sec	2.3 sec	3.6 sec

The results of masked logs in Table 7 show that EtE monitor does a good job of page access reconstruction even when the requests do not have any referer fields. However, with the knowledge introduced by the referer fields in the filtered logs, the number of reconstructed page accesses increases by 9-21% for the considered URLs in Table 7. Additionally, we also find that the number of reconstructed accesses increases by 11.2-19.8% for all the considered URLs if EtE monitor processes the original logs without filtering either the via fields or the referer fields. The difference of EtE time between the two kinds of logs in Table 7 can be explained by the difference of the number of reconstructed accesses. Intuitively, more reconstructed page accesses lead to higher accuracy of estimation. This observation also challenges the accuracy of active probing techniques considering their relatively small sampling sets.