The Internet is not living up to its potential. While the Web has been a tremendous success, providing millions of non-technical users with convenient access to information and the ability to perform transactions on-line, the number of truly distributed applications that have succeeded on the Internet is shockingly small. Even those few applications that support interaction among clients, such as chat rooms, auctions, and file-sharing services, require all operations to pass through a common server. As the success of content distribution networks (CDNs) and peer-to-peer (P2P) applications have shown, there is clearly a great demand for large-scale distributed applications. The major barrier to supporting these, and even richer, applications on the Internet is the difficulty of designing, building, testing, and maintaining distributed applications using the tools that comprise the state-of-the-art today.
We can draw a parallel between the complex task of product development and computer application development. In general, manufacturers explicitly manage the life-cycle of their products. Moreover, they typically have specific tools to support different life-cycle phases, e.g., monitoring tools and statistical packages for quality control, and specialized CAD tools for product design. We believe that the analogous life-cycle for an application would have five stages: the design stage, the implementation stage, the testing stage, the deployment and operation stage, and the maintenance and evolution stage. While good tools exist for the life-cycle stages of traditional non-networked applications (e.g., debuggers, profilers and logging tools), no such tools exist for distributed applications. The goal of our work is to provide a tool chain that supports each of the stages in the life-cycle of Internet applications.
Supporting the life-cycle stages of applications in traditional distributed environments has received a great deal of attention in the past. For example, a wide variety of tools are available for traditional distributed systems. These range from simple communication libraries such as MPI (for scientific computing) to comprehensive environments such as Corba (for enterprise applications). However, these tools target smaller scale, mostly closed environments, which are fundamentally different from the Internet.
Recent efforts have begun to address these same challenges in the Internet context. For example, efforts in DHTs [1] and self-organized overlays [3] are developing a collection of building blocks that help in the implementation of large P2P applications. Similarly, simulation tools like ns-2 [6] and open testbeds like Emulab [10] and Planetlab [7] have provided excellent platforms for the comparison of different designs choices. However, the research community has largely overlooked the later stages of the life-cycle - specifically, testing, deployment and evolution of these applications. Additionally 1, possibilities for integration of tools from different life-cycle stages are as yet unexploited. For example, the models of system behavior generated during the design stage may prove useful in debugging errant behavior during operation.
In this paper, we describe some of the challenges in addressing the needs of distributed applications in the later life-cycle stages. Our initial work in supporting distributed applications has concentrated on the problem of maintenance and evolution - specifically the problem of upgrading a distributed application and possibly rolling back an upgrade. We describe the challenges in addressing this problem as well as some of our initial solutions in Section 2. In Section 3, we discuss some of the issues in our next area of focus - testing and debugging deployments of distributed applications. We summarize our observations and conclude in Section 4.