Check out the new USENIX Web site. next up previous
Next: Stonith Devices: Node based Up: Implementing Clusters for High Previous: Important Lessons


I/O Fencing

Since clusters may transfer the services (as hierarchies) among the nodes, it is vitally important that only a single copy of a given service be running anywhere in or outside of the cluster. If this is violated, both of these instances of the service would be accessing and updating the same data, leading to immediate corruption.

For this reason, it is simply not good enough for a re-formed cluster to conclude that any nodes that can't be contacted is passive and not accessing current data, the cluster must take action to ensure this.

A primary worry is the so called ``Split Brain'' scenario where all communication between two nodes is lost and thus each thinks the other to be dead and tries to recover the services accordingly. This situation is particularly insidious if the communication loss was caused by a ``hang'' condition on the node currently running the service, because it may have in-cache data which it will flush to storage the moment it recovers from the hang.



Subsections
next up previous
Next: Stonith Devices: Node based Up: Implementing Clusters for High Previous: Important Lessons
James Bottomley 2004-05-12