Next: Modified Optimizations
Up: Cross-Architectural Performance Portability of
Previous: Cross-Architectural Performance Portability of
The Alpha processor architecture [23] and the Intel
x86 processor architecture [1] have totally different design
philosophies. Alpha, which is a RISC
architecture [18], provides a minimal, simple
instruction set which can be efficiently decoded. Intel x86 is a CISC
architecture which is designed to run more complex operations within a
single instruction, and thus includes more different instructions and
formats. While porting Compaq's Fast VM [5] from Alpha to
x86, we encountered several opportunities and pitfalls because of this
change in architectural philosophy.
Fortunately, many parts of the JVM required little or no modification
when switching from one architecture to another. These parts include
the class loader, bytecode verifier, and most of the garbage
collector. Other parts of the JVM were ported by others - we took
advantage of Sun's port of the Java libraries to x86/Linux so we did
not have to repeat that work. Instead, we concentrated on the major
changes required in the just-in-time (JIT) compiler and closely
related modules, like the stack unwinding mechanism. Crucial to a
successful (i.e., fast) port of the JIT was maintaining the quality of
generated code that was the result of many optimizations performed in
the RISC JIT. We found that some optimizations were straightforward
to port, other optimizations required major rework, and still others
were simply unworkable in a CISC architecture. Finally, we also found
that some additional optimizations not required at all by a RISC
machine were of critical importance to fast CISC code.
The different design philosophies of the Alpha and x86 architectures
impose different design constraints on a Java virtual machine:
- Reduced number of registers: The Alpha architecture has 31
registers, compared to the x86 architecture which has only 8. This
differential makes it crucially important to do register allocation
well on the x86.
- Instructions contain multiple operations: On a RISC
architecture instructions either load values from memory, store values
into memory, or execute arithmetic operations. In contrast, CISC
architectures support complex instructions that integrate these
different RISC functions into a single instruction. Selecting the
optimal instruction for a certain task, therefore, becomes more
difficult on the x86.
- Different addressing modes: Because x86 instructions
decompose into multiple operations, similar instructions are built
from slightly different primitive operations. For example, an addition
can add a value in memory and a value in a register, or add two
registers.
- Non-orthogonality of instruction set: Not all registers can
be used with every instruction, so CISC architectures impose
additional constraints on how data is allocated to registers.
- Source registers get overwritten: Within an arithmetic
instruction, a source register is often overwritten on a CISC
architecture to store the result. If the old value of the
source register is needed, an additional copy step before such an
instruction is required.
In addition to these five general aspects, the x86 architecture has the following design differences with Alpha:
- 32-bit architecture: Porting the JVM from a 64-bit
architecture to a 32-bit architecture introduces several
complications. Since the Java VM supports 64-bit integer operations, a
32-bit implementation must emulate these operations using multiple
instructions. Furthermore, the 32-bit architecture limits the maximum
feasible heap size to 4GB.
- Limited set of registers per instruction: RISC instructions
support access to either all integer or floating-point registers,
depending on the instruction. On the x86 architecture, certain
instructions require their arguments to be in certain registers. For
example the shift operations require the shift amount to be given in
register %cl, whereas in the Alpha architecture, the shift amount can
be in any register. These restrictions impose additional complexity on
register allocation [8].
- Floating-point stack versus floating-point registers: In
the x86 architecture, all floating-point operations are executed on a
floating-point stack instead of floating-point registers.
Operationally, an arithmetic operation on two floating-point values
pops the first two elements on the floating-point stack, executes the
operation, and pushes the result on the floating-point stack. The
resulting stack has one less element than the original stack. The
register allocator must take these movements into consideration.
- Floating-point precision toggle: On the Alpha architecture,
the precision of the floating-point operation is always encoded in the
instruction itself, whereas it needs to be explicitly set by an
additional instruction in the x86 architecture before the instruction
operates on two registers on the floating-point stack.
The following two sections describe various optimizations we
implemented in the x86 JVM. Section 2 describes
modifications we made to existing optimization algorithms to port them
from Alpha to x86. Section 3 describes new
optimizations implemented specifically for the x86.
Next: Modified Optimizations
Up: Cross-Architectural Performance Portability of
Previous: Cross-Architectural Performance Portability of