Our algorithm requires the masters to be fully connected to each other through secure (e.g. cryptographically) communication links, and implement a reliable, total-ordering, broadcast protocol that can tolerate benign (non-malicious) server failures. The broadcast protocol itself is outside the scope of this paper; a good choice could be for example the protocol described in [8].
Using this reliable broadcast protocol, master servers ensure they always agree on the same sequential ordering for write requests. Through the same broadcast protocol, the masters also elect one of them to function as an auditor. The auditor checks (in the background) the correctness of computations performed by slaves, and takes corrective action when any of them is found acting maliciously.
Each master server is responsible for updating the servers in its slave set when the data content changes due to writes. This updating occurs only after the masters have comitted the write. The masters also periodically broadcast their slave list to the master set, so in the event of a master crash, the remaining ones will divide its slave set. This also entails that all the clients connected to the crashed server will have to go through the setup process again.
It is important to notice that a slave receives a state update only after that update has been comitted. The reason we have chosen this ``lazy'' state update algorithm, as opposed to having masters and slaves participate in the total ordering broadcast, is performance. Since only masters are trusted, a total ordering broadcast protocol including the slaves would have to be resistant to byzantine failures, and implementing such an algorithm over a WAN is extremely expensive. ``Lazy'' state updates make the write protocol much more efficient, but also weaken the consistency model since a client cannot be guaranteed that once his write is comitted it will be seen in all subsequent reads. We tackle this problem by introducing a special parameter, dubbed that bounds the inconsistency window for a given write operation: a client is guaranteed that once time has elapsed since comitting a write, no other client will accept a read that is not dependent on that write. It is worth stressing out that we do not guarantee that a write will propagate to all slaves in (this will violate the asynchronous nature of the WAN environment); there may be slaves for which it takes longer that to get a state update, but if they behave correctly they should stop handling user requests until they are back in sync (later we will show how to handle malicious slaves).