Next: Components
Up: Experimental results
Previous: Compression
For OLE archives, Office optimizes ``save'' operations by appending
modifications to the end of the file rather than rewriting the whole
file every time. While this optimization allows for much faster
document saves, it can lead to a significant increase in file sizes.
If the user deletes or rewrites a substantial portion of a document
and saves it, the original data, now garbage, will be retained. The
extra data does not pose a problem for clients accessing the document
over random access file systems, enabling the application to skip
the dead data. Clients accessing documents over protocols that do not
support random access, such as HTTP, are forced to download the whole
document before opening it. The end result is fetching extra data
that is never used.
In contrast, when a user asks Office to ``save
as,'' a new document is written from scratch, without any garbage that
may have been in the original document.
We measured the changes in file size for OLE archives by using the
``save as'' operation. In this experiment we only considered
documents that were already in Office 2000 file formats. Other
documents are not included because the ``save as'' operation not only
results in garbage collection but also reformats the documents to the
Office 2000 formats, which may change document size.
Figure 8 shows the results of this experiment. Most
documents get some benefit from garbage collection. Interestingly,
24% of Word documents and 35% of PowerPoint documents achieve saving
greater than 16%.
Figure 8:
Percentage saved by garbage collection of OLE archive documents.
|
Next: Components
Up: Experimental results
Previous: Compression
Eyal DeLara
2000-05-16