Next: Comparing OLE archives and
Up: Experimental results
Previous: Document size
4.2 Size breakdown
Figure 4 shows the breakdown of document sizes
for Word documents. For every size category it
shows the contributions of text, formatting information, embedded
objects, and images to the documents size. We measured similar
breakdowns for PowerPoint and Excel document, but because of space
concerns we do not include them in this paper. The PowerPoint
documents showed a similar trend to that of
figure 4, while in Excel documents the text
component accounts for over 95% of the document size in all the size
categories.
Figure 4 show that small Word documents are
dominated by text and formatting information. For larger Word
documents, however, image and embedded component data become the prevalent
contributors to document size. This data strongly suggests that
efforts to improve access to compound documents should focus on the
image and the embedded component data.
One possible optimization would be to remove the embedded component native
data from documents that are fetched exclusively for reading. As
described in section 2.2, this data is only necessary
when editing an embedded component. Users are still able to
display the document using the cached image of the component. We
measured the savings of this schema and found that it would lead to a
reduction in bandwidth requirements for Word and PowerPoint documents
as high as 35% and 21%, respectively. PowerPoint documents show less
potential benefit because PowerPoint compresses its components data
before storing it in the OLE archive, whereas Word does not use compression.
Figure 4:
Size breakdown of Word documents. The plot shows that as documents get bigger, images and embedded component data account for most of the document's size.
|
Next: Comparing OLE archives and
Up: Experimental results
Previous: Document size
Eyal DeLara
2000-05-16