Threaded Interpreter

SableVM's interpreter is a threaded interpreter.
Pure bytecode interpreters suffer from expensive
dispatch costs: on every iteration, the dispatch loop fetches the next bytecode, looks up the associated implementation address in a table (explicitly, or
through a switch statement), then transfers the control to that address. Direct threading[20] reduces this overhead: in the executable code stream, each bytecode is replaced by the address of its associated implementation. In addition, each bytecode implementation ends with the code required to dispatch the next opcode. This is illustrated in figure 2. This technique eliminates the table lookup and the central dispatch loop (thus eliminating a branch instruction to the head of the loop). As these operations are expensive on modern processors, this technique has been shown to be quite effective[20,27].

Figure 2: Pure and threaded interpreters

/* code */

char code[] = {
  ICONST_2, ICONST_2, 
  ICONST_1, IADD, ...
}
char *pc = code;

/* dispatch loop */

while(true) {
  switch(*pc++) {
  case ICONST_1: *++sp = 1; break;
  case ICONST_2: *++sp = 2; break;
  case IADD: 
    sp[-1] += *sp; --sp; break; 
  ...
}}

/* code */

void *code[] = {
  &&ICONST_2, &&ICONST_2, 
  &&ICONST_1, &&IADD, ...
}
void **pc = code;

/* implementations */

goto **(pc++);

ICONST_1: *++sp = 1; goto **(pc++);
ICONST_2: *++sp = 2; goto **(pc++);
IADD: 
  sp[-1] += *sp; --sp; goto **(pc++);
...

(a) Pure bytecode interpreter

(b) Threaded Interpreter

Method bodies are translated to threaded code on their first invocation. We take advantage of this translation to do some optimizations. For example, we precompute absolute branch destinations, we translate overloaded bytecodes like the
GET_FIELD instruction to separate implementation addresses (GET_FIELD_INT, GET_FIELD_FLOAT, ...),
and we inline constant pool references to direct
operand values.

This one pass translation is much simpler than the translation done by even the most naive just-in-time compiler, as each bytecode maps to an address, not a variable sized implementation. However, unlike a JIT, the threaded interpreter still pays the cost of an instruction dispatch for each bytecode. Piumarta[27] has shown a technique to eliminate this overhead within a basic block using selective inlining in a portable manner, at the cost of additional memory⁶. SableVM implements this technique optionally through a compile-time flag, as it might not be appropriate for systems with little memory.