Check out the new USENIX Web site. next up previous
Next: Embedded components Up: Components Previous: Components and document size

4.6.2 Images

Images are the most common type of non-text data found in Office documents. As table 4 shows, 34.62% of Word and 77.01% of PowerPoint documents have at least one image. We do not present results for Excel documents as very few of them have any images at all.

Figures 9 and 10 show the average number of distinct images and the average size of images for PowerPoint documents. We plot the number of distinct images instead of the total number of images because Office applications cache a single copy of every image regardless of the actual number of times the image appears in the document.

Both plots show similar trends, with increases in the number and size of images as documents get bigger. These results are consistent with the findings of section 4.2, where the size contribution of images to document size becomes the dominant factor as document size increases. The results for Word are similar, and are omitted for brevity.

We compared the average size of images in Office documents to the findings of previous Web studies [2,23]. In general, these studies report the average size of images between 5 KB and 22 KB. In comparison, Office documents, especially PowerPoint documents, tend to have larger images. These results suggest that image distillation and other adaptation techniques are at least as important for compound documents as they are currently for Web documents.

We measured the reuse of images across our PowerPoint documents by calculating the Adler-32 checksum [8] of the image's data and counting the number of documents that have images with the same signatures. We found that of the 16,189 images embedded in PowerPoint documents, only 14,016 are distinct, while 1,241 images, or 8.85%, appeared in more than one document. We calculated the potential bandwidth savings of a perfect cache for a PowerPoint client reading all the documents in our dataset that came from the same Web site. We found that 26% of the Web sites get some bandwidth savings from the perfect cache, while 11% of the sites see reductions in required bandwidth that are greater than 20 %.
  
Figure 9: Average number of images in PowerPoint documents.
\begin{figure}\psfig{file=plots/ppt_images.epsi,width=2.8in}
\end{figure}


  
Figure 10: Average image size in PowerPoint documents.
\begin{figure}\psfig{file=plots/ppt_imagesize.epsi,width=2.8in}
\end{figure}


 
Table 4: Images statistics for Word and PowerPoint documents. The table shows the percentage of documents that have at least one images, the average number of images in documents with images, and the average image size.
  Application
Statistic Word PowerPoint
% of documents with images 34.62 77.01
avg. distinct images 6.01 10.62
avg. image size (KB) 21.58 47.82


next up previous
Next: Embedded components Up: Components Previous: Components and document size
Eyal DeLara
2000-05-16