Next: Bibliography
Up: Opportunities for Bandwidth Adaptation
Previous: Generality of results
6 Conclusions and discussion
We characterized compound documents generated by the three most
popular applications of the Microsoft Office suite: Word, PowerPoint,
and Excel. Our focus was on identifying opportunities for adapting
these documents to the constraints of bandwidth-limited clients. Our
study encompassed over 12,500 documents, comprising over 4 GB of data,
retrieved from 935 different Web sites.
We identified the following opportunities for adaptation:
- 1.
- For large documents, images and components account for the majority of
the data. Moreover, images and image components are the most common
non-text data found in Office documents. These results suggest that
components, and in particular images should be the main focus of any
adaptation efforts. We are currently in the process of adding
quality-aware transcoding and caching of images and components to
Puppeteer and plan to measure the savings of these techniques.
- 2.
- For read only documents, discarding the native component data results
in savings of up to 35% and 21% for Word and PowerPoint respectively.
- 3.
- Garbage collection of OLE archives achieves savings greater than 16% for 24% of Word and
35% of PowerPoint documents.
- 4.
- Compression achieves savings of 77% for OLE archives and 90% for
XML. Moreover, once compressed there is no significant difference in
the sizes of the two file formats. Since XML formats are
significantly easier to parse and manipulate than OLE archives, they are
a more attractive target for adaptation.
- 5.
- The structure of Office documents (pages, slides, and sheets) can
be used to download elements on demand and reduce the time that
users wait before they can start work on the document.
Furthermore, our experience studying the Office file formats resulted in the following insights:
- 1.
- The data suggests that the ``save as'' operation is largely
misunderstood by users. The large savings that we show from garbage
collection suggest that users do not understand the implications of
fast-save mode (the default), instead believing the ``save as''
operation to be a way to create a copy of the document.
- 2.
- The lack of built-in support for compression in OLE archives has
forced designers to implement ad-hoc solutions to achieve
high performance. This experience suggests that a compression
feature would be a desirable addition to OLE archives.
- 3.
- OLE archive formats are likely to remain the preferred intermediate format
for Office documents, while the XML-based format will likely be the
format of choice for Web publishing. The XML-based format has the
advantage that it can more easily be interpreted by application other
than Office (e.g., Web browsers). It is also amenable to
widespread browser techniques that improve user perceived latency,
such as incremental rendering and fetch on-demand. On the flip side, the
current implementation of Office 2000 does not implement
incremental loading or writing of XML-based documents, leading to
higher latencies for opening and storing XML-based documents than
those experienced on similar OLE archive documents. Moreover, some of
the Office formats do not yet have XML equivalents.
Next: Bibliography
Up: Opportunities for Bandwidth Adaptation
Previous: Generality of results
Eyal DeLara
2000-05-16