Check out the new USENIX Web site.

Motivation

The installation, organization, and management of disk images is a critical component of modern virtualization technologies. In typical configurations, it is the disk images (and in particular the root disk image) that defines the identity of a particular virtual machine instance. To be effective, a storage virtualization system must be extensible, able to scale to a large number of virtual machine instances, and support advanced storage features such as replication, snapshoting, and migration. It is also highly desirable that such a system be able to rapidly clone disk images, such that multiple virtual machines may use it as a template for their own image.

The introduction of cloud computing magnifies the importance of scalability and efficiency in dealing with storage virtualization. Instead of dozens of virtual machines, cloud environments are designed to support thousands (if not hundreds of thousands [1]) of virtual machines and so will require many-thousand virtual disk images in order to function - as well as sufficient infrastructure to provide backups and management.

Today's virtualization environments have a variety of mechanisms to provide disk images. Users may use raw hardware partitions physically located on the host's disks, partitions provided by an Operating System's logical volume manager [4], or partitions accessed via a storage area network. In addition to these raw partitions, many hypervisors provide copy-on-write mechanisms which allow base images to be used as read-only templates for multiple logical instances which store per-instance modifications.

We have previously experimented with both file and block-based copy-on-write technologies [9] for managing the life cycle of servers. While we found such ``stackable'' technologies to be very effective for initial installation, the per-instance copy-on-write layers tended to drift. For example, over the lifetime of the Fedora 7 Linux distribution there were over 500MB of software updates to the base installation of 1.8GB. While that represents only about 27% of changes over a year of deployment - it becomes greatly magnified in a large-scale virtualization environment.

Our assessment at the time was that simple stacking wasn't sufficient, and that a content addressable storage (CAS) approach to coalescing duplicate data between multiple disk images would provide a solution to the incremental drift of virtual disks. Additionally, the nature of CAS would obviate the need for end-users to start with a template image as any duplication would be identified and addressed by the CAS back end. Furthermore, CAS solutions lend themselves to rapid cloning, snapshoting, and can be configured to implicitly provide temporal-based backups of images.

Others have looked at using CAS solutions for archival of virtual machine images [12] and managing disk images [11]. Nath, et.al. evaluate the use and design tradeoffs of CAS in managing a large set of VM-based systems in an enterprise environment [8]. In all of these cases, the authors used content addressable storage as a sort of library from which disk images could be ``checked-out''. We were more interested in looking at a live solution where the disk image was always directly backed by content addressable storage such that no check-in or check-out transactions are necessary.

The rest of this paper is organized as follows: section 2 provides the results of our preliminary analysis comparing the amount of duplication present in several loosely related disk images. Section 3 describes our prototype implementation of a content addressable image management system for virtual machines. Section 4 gives our preliminary performance analysis of the prototype, and section 5 describes our status and future work.

Eric Van Hensbegren 2008-11-04