Check out the new USENIX Web site. next up previous
Next: 6 Experimental results Up: Puppeteer: Component-based Adaptation for Previous: 4 Experimental environment

5 Data sets

 

We select the set of PowerPoint documents used in our experiments from a collection of Microsoft Office documents that we characterized earlier [11]. The full collection includes 2,167 documents downloaded from 334 Web sites with sizes ranging from 20 KB to 21 MB.

We obtain our HTML documents by re-executing the traces of Web client accesses collected and characterized by Cunha et al. [10]. These traces include accesses from two user groups made during a period of 7 months from November 1994 through May 1995. These traces have 46,830 unique URLs corresponding to 3,026 Web sites. For every URL that we are able to access (many pages had either disappeared or were corrupted), we download the HTML file and any images referenced by them. We do not download any documents linked from these pages. In this manner we acquire 3,796 HTML files and 15,329 images, comprising 89 MB of data downloaded from 1,009 sites. Documents range in size from a few bytes to 773 KB, including images.

Because these data sets are so large, transmitting them at low bandwidth takes prohibitively long. We therefore run our experiments on just 92 PowerPoint documents and 182 HTML documents. For those subsets, the longest experiment requires 138 minutes for PowerPoint, and 55 minutes for HTML. For completeness, however, we run one test over the full sets of both document types over a high-bandwidth network, verifying that our selected documents and the full document sets produce similar results.

For our PowerPoint experiments, we select 92 documents by sorting all documents larger that 32 KB into buckets with sizes increasing by powers of 2. We then randomly select 10 documents from each bucket. The largest bucket, consisting of documents with sizes greater than 16 MB has only 2 documents. Thus, our experimental set has members.

For our IE experiments, we select 182 HTML documents from the downloaded set by sorting all documents larger than 4 KB into buckets with sizes increasing by powers of 2. We then randomly select 25 documents from each bucket. The largest bucket, consisting of documents with sizes greater than 512 KB has only 7 documents. Thus, our experimental set has members.


next up previous
Next: 6 Experimental results Up: Puppeteer: Component-based Adaptation for Previous: 4 Experimental environment

Eyal DeLara
Tue Jan 23 15:09:44 CST 2001