Next: Network-Sync Option Up: Smoke and Mirrors: Reflecting Previous: Goals:

Network-Sync Remote Mirroring

Network-sync strikes a balance between performance and reliability, offering similar performance as semi-synchronous solutions, but with increased reliability. We use a forward-error correction protocol to increase the reliability of high-quality optical links. For example, a link that drops one out of every 1 trillion bits or 125 million 1 KB packets (this is the maximum error threshold beyond which current carrier-grade optical equipment shuts down) can be pushed into losing less than 1 out of every packets by the simple expedient of sending each packet twice -- a figure that begins to approach disk reliability levels [7,15]. By adding a callback when error recovery data has been sent, we can permit the application to resume execution once these encoded packets are sent, in effect treating the wide-area link as a kind of network disk. In this case, data is temporarily ``stored'' in the network while being shipped across the wide-area to the remote mirror. Figure 1 illustrates this capability.

One can imagine many ways of implementing this behavior (e.g. datacenter gateway routers). In general, implementations of network-sync remote mirroring must satisfy two requirements. First, they should proactively enhance the reliability of the network, sending recovery data without waiting for any form of negative acknowledgment (e.g. TCP fast retransmit) or timeouts keyed to the round-trip-time (RTT) to the remote site. Second, they must expose the status of outgoing data, so that the sender can resume activity as soon as a desired level of in-flight redundancy has been achieved for pending updates. Section 3.1 discusses the network-sync option, Section 3.2 discusses an implementation of it, and Section 3.3 discusses its tolerance to disaster.

Subsections

Next: Network-Sync Option Up: Smoke and Mirrors: Reflecting Previous: Goals:

Hakim Weatherspoon 2009-01-14