2006 USENIX Annual Technical Conference Abstract
Pp. 289300 of the Proceedings
Awarded Best Paper!
Replay Debugging for Distributed Applications
Dennis Geels, Gautam Altekar, Scott Shenker, and Ion Stoica, University of California, Berkeley
Abstract
We have developed a new replay debugging tool, liblog, for
distributed C/C++ applications. It logs the execution of deployed
application processes and replays them deterministically, faithfully
reproducing race conditions and non-deterministic failures, enabling
careful offline analysis.
To our knowledge, liblog is the first replay tool
to address the requirements of large distributed systems: lightweight
support for long-running programs, consistent replay of arbitrary
subsets of application nodes, and operation in a mixed environment of
logging and non-logging processes. In addition, it requires no
special hardware or kernel patches, supports unmodified application
executables, and integrates GDB into the replay mechanism for
simultaneous source-level debugging of multiple processes.
This paper presents liblog's design, an evaluation of its runtime
overhead, and a discussion of our experience with the tool to date.
- View the full text of this paper in HTML and PDF. Listen to the presentation in MP3 format.
The Proceedings are published as a collective work, © 2006 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
|