Check out the new USENIX Web site.


Introduction

Federated geographically-distributed computing platforms such as PlanetLab [3] and the Grid [7,8] have recently become popular for evaluating and deploying network services and scientific computations. As the size, reach, and user population of such infrastructures grow, resource discovery and resource selection become increasingly important. Although a number of resource discovery and allocation services have been built [1,11,15,22,28,33], there is little data on the utilization of the distributed computing platforms they target. Yet the design and efficacy of such services depends on the characteristics of the target platform. For example, if resources are typically plentiful, then there is less need for sophisticated allocation mechanisms. Similarly, if resource availability and demands are predictable and stable, there is little need for aggressive monitoring.

To inform the design and implementation of emerging resource discovery and allocation systems, we examine the usage characteristics of PlanetLab, a federated, time-shared platform for ``developing, deploying, and accessing'' wide-area distributed applications [3]. In particular, we investigate variability of available host resources across nodes and over time, how that variability interacts with resource demand of several popular long-running services, and how careful application placement and migration might reduce the impact of this variability. We also investigate the feasibility of using stale or predicted measurements to reduce overhead in a system that automates service placement and migration.

Our study analyzes a six-month trace of node, network, and application-level measurements. In addition to presenting a detailed characterization of application resource demand and free and committed node resources over this time period, we analyze this trace to address the following questions: (i) Could informed service placement--that is, using live platform utilization data to choose where to deploy an application--outperform a random placement? (ii) Could migration--that is, moving deployed application instances to different nodes in response to changes in resource availability--potentially benefit some applications? (iii) Could we reduce the overhead of a service placement service by using stale or predicted data to make placement and migration decisions? We find:

The remainder of this paper is organized as follows. Section 2 describes our data sources and methodology. Section 3 surveys platform, node, and network resource utilization behavior; addresses the usefulness of informed service placement; and describes resource demand models for three long-running PlanetLab services--CoDeeN [26], Coral [10], and OpenDHT [20]--that we use there and in subsequent sections. Section 4 investigates the potential benefits of periodically migrating service instances. Section 5 analyzes the feasibility of making placement and migration decisions using stale or predicted values. Section 6 discusses related work, and in Section 7 we conclude.

David Oppenheimer 2006-04-14