Check out the new USENIX Web site. next up previous
Next: 2.2 Creating a reliable Up: 2 The simplified scheme Previous: 2 The simplified scheme


2.1 Finding partners

We suggest using a simple central server to keep track of the computers in the system and their partner needs. Many other methods of finding partners are possible--for example, a Gnutella-like flooding approach could be used--but the central-server method has the advantage of being very simple to implement.

Each computer should periodically update the server with its identity and what partners it needs and has, including uptime and storage-swapping levels. When a computer needs a new partner, it contacts the server with its needs and obtains a list of candidate partners; it can then contact those computers directly and find out if they are still compatible.

Sometimes there may be no other computers looking for new partners. In that case, a computer looking for new partners needs to step between two existing partners that have an agreement similar to the one it desires: if $\cal A$ and $\cal B$ are partners, $\cal N$ can step between them so that $\cal A$ now has $\cal N$ for a partner instead of $\cal B$ and $\cal B$ now has $\cal N$ as a partner instead of $\cal A$. This leaves $\cal A$ and $\cal B$ with the same number of partners, but gives $\cal N$ two new partners of the type it wants. By having $\cal N$ copy $\cal A$ and $\cal B$'s data beforehand, this can be done atomically with no data loss.

To avoid complicated negotiation, we suggest appropriately quantizing uptime and storage-swapping levels. Ideally, to work well, the system should have many (at least a hundred, preferably more than ten thousand) members spanning the range of possible agreements. Computers wishing to swap huge amounts may still be out of luck finding compatible partners, but can compensate (with somewhat lower reliability) by using multiple partners swapping less each.

It is important that each partner in a pair be located at different sites in order to ensure all backups are stored off-site. Accordingly, computers should reject candidate partners that are co-located. This means that our scheme cannot be used safely within a single site. Additional reliability can be obtained by further diversification: a single computer should choose its partners from as many different sites and using as many different operating systems (to guard against viruses) as it can. To allow this, the identity information supplied to the central server should include a computer's ``location'' and operating-system type. Location information can either be obtained directly from the computer owner or estimated via IP ranges or domain-registry information.

Although the central server forms a single point of failure for finding new partners, it need keep no permanent state and is thus easily replaced or replicated should it fail or become a bottleneck. Its failure does not prevent backups or restorations from occurring; thus, as long as it is repaired within a reasonable amount of time (i.e., weeks), no real harm is done.


next up previous
Next: 2.2 Creating a reliable Up: 2 The simplified scheme Previous: 2 The simplified scheme
Mark Lillibridge 2003-04-07