Check out the new USENIX Web site. next up previous
Next: Components Up: Experimental results Previous: Compression

4.5 Garbage collection

For OLE archives, Office optimizes ``save'' operations by appending modifications to the end of the file rather than rewriting the whole file every time. While this optimization allows for much faster document saves, it can lead to a significant increase in file sizes. If the user deletes or rewrites a substantial portion of a document and saves it, the original data, now garbage, will be retained. The extra data does not pose a problem for clients accessing the document over random access file systems, enabling the application to skip the dead data. Clients accessing documents over protocols that do not support random access, such as HTTP, are forced to download the whole document before opening it. The end result is fetching extra data that is never used.

In contrast, when a user asks Office to ``save as,'' a new document is written from scratch, without any garbage that may have been in the original document.

We measured the changes in file size for OLE archives by using the ``save as'' operation. In this experiment we only considered documents that were already in Office 2000 file formats. Other documents are not included because the ``save as'' operation not only results in garbage collection but also reformats the documents to the Office 2000 formats, which may change document size.

Figure 8 shows the results of this experiment. Most documents get some benefit from garbage collection. Interestingly, 24% of Word documents and 35% of PowerPoint documents achieve saving greater than 16%.
  
Figure 8: Percentage saved by garbage collection of OLE archive documents.
\begin{figure}\psfig{file=plots/garbage.epsi,width=2.8in}
\end{figure}


next up previous
Next: Components Up: Experimental results Previous: Compression
Eyal DeLara
2000-05-16