Check out the new USENIX Web site. next up previous
Next: Conclusions Up: Testing Stage Previous: Test Queries

Debugging Support

When developers of traditional applications need to understand application behavior, they often employ source-level debuggers. A debugger provides a developer the ability to study program behavior at a microscopic level. Specific abilities of common debuggers include single-stepping through program execution, live inspection of complete program state, including memory and registers, and post-mortem analysis of program state via core files.

How might we provide similar capabilities to developers of networked applications? As a first, naïve, approach, consider building a debugging system for distributed applications using traditional debuggers and a remote invocation facility such as ssh. In this approach, the remote invocation facility is used to attach a debugger process to each process of the distributed application. Unfortunately, this approach approach does not work well. To illustrate, we consider how such a system deals with single-stepping.

How do we implement single-stepping in this system? This is, perhaps, a trick question. While single-stepping is well-defined for a single process, there is no clear analogue for an application with multiple processes. There is no obvious order in which we could instruct the processes to execute their next steps, because concurrent execution is rife with the potential for race conditions. Hence, it does not matter which implementation we choose: single-stepping is inherently an inappropriate debugging primitive for distributed applications. One important challenge, then, is to find a set of primitives that provide meaningful and useful semantics for distributed applications.

One possibility is to support a less demanding form of application debugging. For example, instead of using traditional debuggers at each node, message logging may provide a useful first approximation to single stepping. Clearly, capturing all the messages generated at all nodes could be very expensive for some applications. Instead, we may limit the log to those messages associated with certain requests, restricted either by message type, or by node using the techniques described for validation in Section 3.1. Sahai et al. [8] propose a method of capturing all the messages associated with a request that may be useful for this approach. Either synchronized clocks or Lamport clocks could be used to timestamp events in these logs. Developers could perform real-time visualization of the logs to observe their application in operation. Alternatively, if all messages are collected it might be possible to diagnose problems by replaying the logs within a simulator, where traditional debugging would be possible.


next up previous
Next: Conclusions Up: Testing Stage Previous: Test Queries
mukesh agrawal 2003-06-17