The utility allocates each service a slice of its resources, including shares of memory, CPU time, and available throughput from storage units. Slices provide performance isolation and enable the utility to use its resources efficiently. The slices are chosen to allow each hosted service to meet service quality targets (e.g., response time) negotiated in Service Level Agreements (SLAs) with the utility. Slices vary dynamically to respond to changes in load and resource status. This paper addresses the provisioning problem: how much resource does a service need to meet SLA targets at its projected load level? A closely related aspect of utility resource allocation is assignment: which servers and storage units will provide the resources to host each service?
Previous work addresses various aspects of utility resource management, including mechanisms to enforce resource shares (e.g., [7,9,36]), policies to provision shares adaptively [12,21,39], admission control with probabilistically safe overbooking [4,6,34], scheduling to meet SLA targets or maximize yield [21,22,23,32], and utility data center architectures [5,25,30].
The key contribution of this paper is to demonstrate the potential of a new model-based approach to provisioning multiple resources that interact in complex ways. The premise of model-based resource provisioning (MBRP) is that internal models capturing service workload and behavior can enable the utility to predict the effects of changes to the workload intensity or resource allotment. Experimental results illustrate model-based dynamic provisioning of memory and storage shares for hosted Web services with static content. Given adequate models, this approach may generalize to a wide range of services including complex multi-tier services [29] with interacting components, or services with multiple functional stages [37]. Moreover, model-based provisioning is flexible enough to adjust to resource constraints or surpluses exposed during assignment.
This paper is organized as follows. Section 2 motivates the work and summarizes our approach. Section 3 outlines simple models for Web services; Section 4 describes a resource allocator based on the models, and demonstrates its behavior in various scenarios. Section 5 describes our prototype and presents experimental results. Section 6 sets our approach in context with related work, and Section 7 concludes.