USENIX 2001 Technical Program

USENIX 2001 Abstract

Reverse-Engineering Instruction Encodings

Wilson C. Hsieh, University of Utah; Dawson R. Engler, Stanford University; and Godmar Back, University of Utah

Abstract

Binary tools such as disassemblers, just-in-time compilers, and executable code rewriters need to have an explicit representation of how machine instructions are encoded. Unfortunately, writing encodings for an entire instruction set by hand is both tedious and error-prone. We describe DERIVE, a tool that extracts bit-level instruction encoding information from assemblers. The user provides DERIVE with assembly-level information about various instructions. DERIVE automatically reverse-engineers the encodings for those instructions from an assembler by feeding it permutations of instructions and analyzing the resulting machine code. DERIVE solves the entire MIPS, SPARC, Alpha, and PowerPC instruction sets, and almost all of the ARM and x86 instruction sets. Its output consists of C declarations that can be used by binary tools. To demonstrate the utility of DERIVE, we have built a code emitter generator that takes DERIVE’s output and produces C macros for code emission, which we have then used to rewrite a Java JIT backend.

View the full text of this paper in HTML and PDF.
The Proceedings are published as a collective work, © 2001 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
To become a USENIX Member, please see our Membership Information.

Need help? Use our Contacts page.

Last changed: 3 Jan. 2002 ml

Technical Program

USENIX 2001 Home

USENIX home