Network-sync strikes a balance between performance and
reliability, offering similar performance as semi-synchronous
solutions, but with increased reliability. We use a forward-error
correction protocol to increase the reliability of high-quality
optical links. For example, a link that drops one out of every 1
trillion bits or 125 million 1 KB packets (this is the maximum error
threshold beyond which current carrier-grade optical equipment shuts
down) can be pushed into losing less than 1 out of every
packets by the simple expedient of sending each packet twice -- a
figure that begins to approach disk reliability
levels [7,15]. By adding a callback when error
recovery data has been sent, we can permit the application to resume
execution once these encoded packets are sent, in effect treating the
wide-area link as a kind of network disk. In this case, data is
temporarily ``stored'' in the network while being shipped across the
wide-area to the remote mirror. Figure 1
illustrates this capability.
One can imagine many ways of implementing this behavior (e.g. datacenter gateway routers). In general, implementations of network-sync remote mirroring must satisfy two requirements. First, they should proactively enhance the reliability of the network, sending recovery data without waiting for any form of negative acknowledgment (e.g. TCP fast retransmit) or timeouts keyed to the round-trip-time (RTT) to the remote site. Second, they must expose the status of outgoing data, so that the sender can resume activity as soon as a desired level of in-flight redundancy has been achieved for pending updates. Section 3.1 discusses the network-sync option, Section 3.2 discusses an implementation of it, and Section 3.3 discusses its tolerance to disaster.