We now consider the problem of synchronous upgrades. We consider the specific problem of an upgrade which is both synchronous, and not backwards compatible. An example of such an upgrade would be switching an application's routing protocol from distance vector to link state.
In designing a solution, we draw inspiration from how Internet routing has been upgraded. In moving from IPv4 to IPv6, the Internet has allowed both protocols to operate simultaneously on nodes. Similarly, in our approach, the upgrade process begins with simultaneous execution of old and new versions on the application nodes. The simultaneous execution provides an opportunity to to bootstrap the new application instance, and thus, minimize unavailability due to the upgrade. After the new version is ready to run, we terminate the old instance.
To support simultaneous execution, we employ virtual machines (VMs). VMs provide the illusion of an independent computing machine while running as a process on some other machine. In this context, the VM is referred to as a guest, and the other machine is called the host.
For our VM, we choose User Mode Linux (UML)[4], which provides a virtual Linux machine running as a process on a Linux host. Two features of UML are required for our purpose. First, it provides the ability to route network packets between the host and guest. Second, it supports copy-on-write filesystem images. To use the copy-on-write facility, the user specifies a base filesystem image, and a copy-on-write file. Both files are stored in the host filesystem. The base filesystem is treated as read-only, and any changes to the guest file system are written to the copy-on-write file.
To employ UML for upgrade/rollback, the distributed application is installed in a UML VM. Before a software upgrade, we duplicate the copy-on-write file and create a snapshot of the running virtual machine process. These files are saved in case a rollback is later required.
Additional copies of the process snapshot and copy-on-write file are then made for the VM that will run the upgraded application. We initialize this VM, using the copied files and the same base filesystem image as the original VM. Next, we perform the software upgrade inside the second VM. Because the two VMs use different copy-on-write files, changes to the filesystem by either VM are not seen in the other VM.
At this point, the system is ready to begin simultaneous execution. If, however, both versions of the application listen on the same network port, we must arbitrate access to that port. With some assistance from the application developer, we can construct viable approaches for both datagram communication (UDP) and byte-stream communication (TCP).
For UDP, we require that the application have a way of identifying and dropping messages from incompatible versions. The application might, for example, include a version number in each message. We then have the host kernel deliver all datagrams for the application to both VMs.
For TCP communication, we cannot simply deliver packets to both VMs. Spurious packets would be processed by the VM's TCP stack, possibly confusing it. Thus the filtering must be done on the host machine. To support this filtering, the application developer provides the host machine with filters that the host machine can use to route application requests to the appropriate application instance. To handle connection establishment, connection requests (SYN packets) are answered by the host machine. After the first request packet from the remote end is received, the application version is identified. The host machine then spoofs a connection request from the remote end to the appropriate VM, discards the VM's response, and forwards the request packet to the VM.
Once the upgrade is complete, we terminate the VM running the older software. How, though, do we determine that an upgrade is complete? Our present approach is to consider the message rate of the old application. As new application nodes enter the system, they choose to run the new application version. Concurrently, old nodes exit the system, decreasing the rate of messages for the old application version. Thus, over time, the message rate for the new version increases, and the message rate for the old version decreases. When the message rate for the old application version at a node drops below a threshold, the node terminates the old application version.
In order to ascertain the viability of a VM approach to simultaneous execution, we have conducted some preliminary experiments on the costs of snapshotting and resuming. Because support for resuming UML is not yet complete, we present results based on a process of similar size (in terms of virtual memory image) as a UML VM. Table 1 presents the time to snapshot and resume processes of varying sizes. Experiments were run on a 766 MHz Pentium III with 256 MB of RAM. Snapshot times are short enough that network connections are unlikely to be disrupted. Note that resume times are consistently low, and independent of process size, because resuming only maps the process' data into memory. The data will be faulted in later as needed.
Based on these results, we believe that a virtual machine based approach is appropriate for providing upgrade/rollback facilities for synchronous upgrades. Note that it may be possible to improve the snapshotting time for large processes by performing lazy snapshots. For example, we could mark the pages of the virtual machine process as copy-on-write, and then save the snapshot data in the background.