 
 
 
 
 
 
   
Applications or kernel components running in a cluster environment, such as a database or a cluster file system can ensure that read and writes to the same block do not proceed at the same time from two nodes in the cluster. This is ensured by taking a cluster wide lock by the application or kernel component.
In a clustered environment, however, there can still be a conflict with respect to concurrent updates to the metadata of the snapshot device. The metadata of a snapshot is its map, which gives information about the mapping of original and snapshot blocks.
For example, suppose the application issues a write (blkno1) on
node A, that takes time from t0 to t1, and node B issues write (blkno1) or write (blkno2) that takes time from t2 to t3. The constraint  , say, has to be satisfied if the write from B is to
blkno1, otherwise there is none. This assumes that a correct application
will ensure that writes on same block cannot come simultaneously from
two different nodes in the cluster.  Now consider how the snapshot
device driver must behave. At the first write from node A, it does
allocate a new block and push the old data there. The problem is about
snapshot device driver (SDD) instance at node B that needs to know if
a block has already been COW pushed.
, say, has to be satisfied if the write from B is to
blkno1, otherwise there is none. This assumes that a correct application
will ensure that writes on same block cannot come simultaneously from
two different nodes in the cluster.  Now consider how the snapshot
device driver must behave. At the first write from node A, it does
allocate a new block and push the old data there. The problem is about
snapshot device driver (SDD) instance at node B that needs to know if
a block has already been COW pushed.
The simplest (and lowest performance) solution is to have a single metadata area with some cluster wide lock on it. Node B, then, takes this lock, reads the translation table, sees that blkno1 is already pushed, and allows the write to succeed. Or, it sees that blkno2 is not yet pushed, and updates SDD metadata for block allocation and does a COW push. This solution is slow because each write from each node is totally serialized on a single lock. One of the alternatives is to divide the block allocation area, with each piece having a different lock so that there is more concurrency. This still serializes writes to a set of blocks that are controlled by the same lock.
 
 
 
 
