Next: Mixed-granularity Address Translation in
Up: Fine-grained Address Translation
Previous: Object Replacement
One of the problems in evaluating different fine-grained translation
mechanisms is the lack of good measurements of system costs and other
related costs in these implementations. The few measurements that do
exist correspond to interpreted systems (except the E
system [15,16]) and usually underestimate the costs
for a high-performance language implementation. For example, a 30%
overhead in a slow (interpreted) implementation may be acceptable for
that system, but will certainly be unacceptable as a 300% overhead
when the execution speed is improved up by a factor of ten using a
state-of-the-art compiler.
Another cost factor for fine-grained techniques that has generally
been overlooked is the cost of maintaining mapping tables for
translating between the persistent and transient pointer formats.
Since fine-grained schemes typically translate one pointer at a time,
the mapping tables must contain one entry per pointer. This is likely
to significantly increase the size of the mapping table, making it
harder to manipulate efficiently.
We believe that the E system [15,16] is probably the
fastest fine-grained scheme that is comparable to a coarse-grained
address translation scheme; however, it still falls short in terms of
performance. Based on the results presented in [20],
E is about 48% slower than transient C/C++ for hot traversals of the
OO1 database benchmark [4,5].13 This is a fairly significant considering that the
overhead of our system is zero for hot traversals and much
smaller (less than 5%) otherwise [8].
We believe that there are several reasons why it is likely to be quite
difficult to drastically reduce the overheads of fine-grained
techniques. Some of these are:
- Several of the basic costs cannot be changed or reduced easily.
For example, the pointer validity and format checks, which are an
integral part of fine-grained address translation, cannot be
optimized away.
- There is a general performance penalty (maintaining and searching
large hash tables, etc.) that is typically independent of the
checking cost itself. As mapping tables get larger, it will be
more expensive to probe and update them, especially because
locality effects enter the overall picture.14
- Complex data-flow analysis and code generation techniques are
required to optimize some of the costs associated with the read
barrier used in the implementation. Furthermore, such extra
optimizations may cause unwanted code bloat.
- Although the residency property can be treated as a type so that
Self-style optimizations [6] can be applied to
eliminate residency checking, it is not easy to do so; unlike
types, residency may change across procedure calls depending on
the dynamic run-time state of the application. As such,
residency check elimination is fundamentally a non-local problem
that depends on complex analysis of control flow and data flow.
Based on these arguments, we believe that fine-grained translation
techniques are comparatively not as attractive for high-performance
implementations of persistence mechanisms.
Taking the other side of the argument, however, it can certainly be
said that fine-grained mechanisms have their advantages. A primary one
is the potential savings in I/O because fine-grained schemes can fetch
data only as necessary. There are at least two other benefits over
coarse-grained approaches:
- fine-grained schemes can support reclustering of objects within
pages, and
- the checks required for fine-grained address translation may
also be able to support other fine-grained features (such as
locking, transactions, etc.) at little extra cost.
In principle, fine-grained schemes can recluster data over short
intervals of time compared to coarse-grained schemes. However,
clustering algorithms are themselves an interesting topic for
research, and further studies are necessary for conclusive proof. We
also make another observation that fine-grained techniques are
attractive for unusually-sophisticated systems, e.g., those supporting
fine-grained concurrent transactions. Inevitably, this will incur an
appreciable run-time cost, even if that cost is ``billed'' to multiple
desirable features. Such costs may be reduced in the future if
fine-grained checking is supported in hardware.
Footnotes
- ...[4,5].13
- The hot
traversals are ideal for this purpose because they represent
operations on data that have already been faulted into memory, thereby
avoiding performance impacts related to differences in loading
patterns, etc.
- ... picture.14
- Hash tables
are known to have extremely poor locality because, by their very
nature, they ``scatter'' related data in different buckets.
Next: Mixed-granularity Address Translation in
Up: Fine-grained Address Translation
Previous: Object Replacement
Sheetal V. Kakkad