USENIX 2001 Abstract
Reverse-Engineering Instruction Encodings
Wilson C. Hsieh, University of Utah; Dawson R. Engler, Stanford University; and Godmar Back, University of Utah
Abstract
Binary tools such as disassemblers, just-in-time compilers, and executable code rewriters need to have an explicit
representation of how machine instructions are encoded. Unfortunately, writing encodings for an entire instruction
set by hand is both tedious and error-prone. We describe DERIVE, a tool that extracts bit-level instruction
encoding information from assemblers. The user provides DERIVE with assembly-level information about
various instructions. DERIVE automatically reverse-engineers the encodings for those instructions from
an assembler by feeding it permutations of instructions and analyzing the resulting machine code.
DERIVE solves the entire MIPS, SPARC, Alpha, and PowerPC instruction sets, and almost all of the
ARM and x86 instruction sets. Its output consists of C declarations that can be used by binary tools.
To demonstrate the utility of DERIVE, we have built a code emitter generator that takes DERIVE’s
output and produces C macros for code emission, which we have then used to rewrite a Java JIT
backend.
- View the full text of this paper in
HTML and
PDF.
The Proceedings are published as a collective work, © 2001 by the USENIX Association. All Rights Reserved. Rights
to individual papers remain with the author or the author's employer.
Permission is granted for the noncommercial reproduction of the complete
work for educational or research purposes. USENIX acknowledges all
trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
- To become a USENIX Member, please see our Membership Information.
|