4th USENIX Symposium on Networked Systems Design & Implementation
Pp. 257–270 of the Proceedings
WiDS Checker: Combating Bugs in Distributed Systems
Xuezheng Liu, Wei Lin, Aimin Pan, and Zheng Zhang, Microsoft Research Asia
Abstract
Despite many efforts, the predominant practice of debugging a distributed system is still printf-based log mining, which is both tedious and error-prone. In this paper, we present WiDS Checker, a unified framework that can check distributed systems through both simulation and reproduced runs from real deployment. All instances of a distributed system can be executed within one simulation process, multiplexed properly to observe the “happensbefore” relationship, thus accurately reveal full system state. A versatile script language allows a developer to refine system properties into straightforward assertions, which the checker inspects for violations. Combining these two components, we are able to check distributed properties that are otherwise impossible to check. We applied WiDS Checker over a suite of complex and real systems and found non-trivial bugs, including one in a previously proven Paxos specification. Our experience demonstrates the usefulness of the checker and allows us to gain insights beneficial to future research in this area.
- View the full text of this paper in HTML and PDF. Listen to the presentation in MP3 format.
Until April 2008, you will need your USENIX membership identification in order to access the full papers.
The Proceedings are published as a collective work, © 2007 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
|