We performed two groups of experiments to validate the accuracy of EtE monitor performance measurements and its page access reconstruction power.
In the first experiment, we used two remote clients residing at Duke University and Michigan State University to issue a sequence of 40 requests to retrieve a designated web page from HPLabs external web site, which consists of an HTML file and 7 embedded images. The total page size is 175 Kbytes. To issue these requests, we use httperf, a tool which measures the connection setup time and the end-to-end time observed by the client for a full page download. At the same time, an EtE monitor measures the performance of HPLabs external web site. From EtE monitor measurements, we filter the statistics about the designated client accesses. Additionally, in EtE monitor, we compute the end-to-end time using two slightly different approaches from those discussed in Section 6.1:
The connection setup time reported by EtE monitor is slightly higher (14-15 ms) than the actual setup time measured by httperf, since it includes the time to not only establish a TCP connection but also receive the first byte of a request. The EtE time (ACK) coincides with the actual measured response time observed by the client. The EtE time (last byte) is slightly lower than the actual response time by exactly a round trip delay (the connection setup time measured by httperf represents the round trip time for each client, accounting for 74-102 ms). These measurements correctly reflect our expectations for EtE monitor accuracy (see Section 6.1). Thus, we have some confidence that EtE monitor accurately approximates the actual response time observed by the client.
The second experiment was performed to evaluate the reconstruction power of EtE monitor. The EtE monitor with its two-pass heuristic method actively uses the referer field to reconstruct the page composition and to build a Knowledge Base about the web pages and objects composing them. This information is used during the second pass to more accurately group the requests into page accesses. The question to answer is: how dependent are the reconstruction results on the existence of referer field information. If the referer field is not set in most of the requests, how is the EtE monitor reconstruction process affected? How is the reconstruction process affected by accesses generated by proxies?
To answer these questions, we performed the following
experiment. To reduce the incorrectness introduced by proxies,
we first filtered the requests with via fields, which are issued
by proxies, from the original Transaction Logs for the both sites.
These requests constitute 24% of total requests
for the HPL site and 1.1% of total requests for the Support
site. We call these logs filtered logs. Further, we mask
the referer fields of all transactions in the filtered logs
to study the correctness of reconstruction. We call these modified
logs masked logs, which do not contain any referer fields.
We notice that the requests with referer fields constitute 56% of
the total requests for the HPL site and 69% for the Support site in
the filtered logs. Then, EtE monitor processes the filtered
logs and masked logs. Table 7 summarizes the results
of this experiment.
The results of masked logs in Table 7 show that EtE monitor does a good job of page access reconstruction even when the requests do not have any referer fields. However, with the knowledge introduced by the referer fields in the filtered logs, the number of reconstructed page accesses increases by 9-21% for the considered URLs in Table 7. Additionally, we also find that the number of reconstructed accesses increases by 11.2-19.8% for all the considered URLs if EtE monitor processes the original logs without filtering either the via fields or the referer fields. The difference of EtE time between the two kinds of logs in Table 7 can be explained by the difference of the number of reconstructed accesses. Intuitively, more reconstructed page accesses lead to higher accuracy of estimation. This observation also challenges the accuracy of active probing techniques considering their relatively small sampling sets.