Discussion

Next: Mixed-granularity Address Translation in Up: Fine-grained Address Translation Previous: Object Replacement

Discussion

One of the problems in evaluating different fine-grained translation mechanisms is the lack of good measurements of system costs and other related costs in these implementations. The few measurements that do exist correspond to interpreted systems (except the E system [15,16]) and usually underestimate the costs for a high-performance language implementation. For example, a 30% overhead in a slow (interpreted) implementation may be acceptable for that system, but will certainly be unacceptable as a 300% overhead when the execution speed is improved up by a factor of ten using a state-of-the-art compiler.

Another cost factor for fine-grained techniques that has generally been overlooked is the cost of maintaining mapping tables for translating between the persistent and transient pointer formats. Since fine-grained schemes typically translate one pointer at a time, the mapping tables must contain one entry per pointer. This is likely to significantly increase the size of the mapping table, making it harder to manipulate efficiently.

We believe that the E system [15,16] is probably the fastest fine-grained scheme that is comparable to a coarse-grained address translation scheme; however, it still falls short in terms of performance. Based on the results presented in [20], E is about 48% slower than transient C/C++ for hot traversals of the OO1 database benchmark [4,5].¹³ This is a fairly significant considering that the overhead of our system is zero for hot traversals and much smaller (less than 5%) otherwise [8].

We believe that there are several reasons why it is likely to be quite difficult to drastically reduce the overheads of fine-grained techniques. Some of these are:

Several of the basic costs cannot be changed or reduced easily. For example, the pointer validity and format checks, which are an integral part of fine-grained address translation, cannot be optimized away.
There is a general performance penalty (maintaining and searching large hash tables, etc.) that is typically independent of the checking cost itself. As mapping tables get larger, it will be more expensive to probe and update them, especially because locality effects enter the overall picture.¹⁴
Complex data-flow analysis and code generation techniques are required to optimize some of the costs associated with the read barrier used in the implementation. Furthermore, such extra optimizations may cause unwanted code bloat.
Although the residency property can be treated as a type so that Self-style optimizations [6] can be applied to eliminate residency checking, it is not easy to do so; unlike types, residency may change across procedure calls depending on the dynamic run-time state of the application. As such, residency check elimination is fundamentally a non-local problem that depends on complex analysis of control flow and data flow.

Based on these arguments, we believe that fine-grained translation techniques are comparatively not as attractive for high-performance implementations of persistence mechanisms.

Taking the other side of the argument, however, it can certainly be said that fine-grained mechanisms have their advantages. A primary one is the potential savings in I/O because fine-grained schemes can fetch data only as necessary. There are at least two other benefits over coarse-grained approaches:

fine-grained schemes can support reclustering of objects within pages, and
the checks required for fine-grained address translation may also be able to support other fine-grained features (such as locking, transactions, etc.) at little extra cost.

In principle, fine-grained schemes can recluster data over short intervals of time compared to coarse-grained schemes. However, clustering algorithms are themselves an interesting topic for research, and further studies are necessary for conclusive proof. We also make another observation that fine-grained techniques are attractive for unusually-sophisticated systems, e.g., those supporting fine-grained concurrent transactions. Inevitably, this will incur an appreciable run-time cost, even if that cost is ``billed'' to multiple desirable features. Such costs may be reduced in the future if fine-grained checking is supported in hardware.

Footnotes

...[4,5].¹³: The hot traversals are ideal for this purpose because they represent operations on data that have already been faulted into memory, thereby avoiding performance impacts related to differences in loading patterns, etc.
... picture.¹⁴: Hash tables are known to have extremely poor locality because, by their very nature, they ``scatter'' related data in different buckets.

Next: Mixed-granularity Address Translation in Up: Fine-grained Address Translation Previous: Object Replacement

Sheetal V. Kakkad