Check out the new USENIX Web site. next up previous
Next: 3.5 Reliable FastReplica Algorithm Up: 3 FastReplica Algorithm Previous: 3.3 Preliminary Performance Analysis


3.4 FastReplica in the Large

In this Section, we generalize FastReplica in the small to a case where a set of nodes to which a file has to be replicated can be in the range of hundreds/thousands of nodes. Let $k$ be a number of network connections chosen for concurrent transfers between a single node and multiple receiving nodes (i.e. $k$ limits the number of nodes in the group for Multiple Unicast or FastReplica strategies). An appropriate value of $k$ can be experimentally determined via probing. Heterogeneous nodes might be capable of supporting a different number of connections. Let $k$ be the number of connections suitable for most of the nodes in the overall replication set. A natural way to scale FastReplica in the small to a large number of nodes is: Schematically, this procedure is shown in Figure 6, where circles represent the nodes, and boxes represent the replication groups. The arrows, connecting one node with a set of other nodes, reflect the origin node and the recipient nodes, involved in communications on a particular iteration of the algorithm.
Figure 6: FastReplica in the large: iterative replication process.
\begin{figure}
\centering
\def 1 ...
At the first iteration, node $N_0$ replicates file $F$ to group $G^1_1$, consisting of $k$ nodes, by using the FastReplica in the small algorithm. At the second iteration, each node $N_i^1 (1\leq i \leq k)$ of group $G^1_1$ can serve as the origin node propagating file $F$ to another group $G^2_i$. Thus in two iterations, file $F$ can be replicated to $k \times k$ nodes. Correspondingly, in three iterations, file $F$ can be replicated to $k \times k \times k $ nodes. The general FastReplica algorithm is based on the reasoning described above. Let the problem consist in replicating file $F$ across nodes $N_1,
...., N_n$ and let ${n \over k} = m$. Then all the nodes are partitioned into $m$ groups:

\begin{displaymath}G^1, G^2, ..., G^m\end{displaymath}

where each group has $k$ nodes. Any number $m$ can be represented as
\begin{displaymath}
m = { {c_1} \times k^{i_1} + {c_2} \times k^{i_2} + ... + {c_j} \times k^{i_j} }
\end{displaymath} (8)

where $i_1 > i_2 > ... > i_j \geq 0$ and $0 < c_1,..., c_j < k$. Practically, it is a $k$-ary representation of a number $m$. This representation defines the rules for constructing the tree structure similar to the one shown in Figure 6. In particular, the height of such a tree is ${i_1}+1$, and it defines the number of iterations in the general FastReplica algorithm. >From this representation, the rules for constructing the corresponding distribution lists of nodes are straightforward. We omit the technical details of the distribution lists construction in order to keep the description of the overall algorithm concise. If the targeted number $n$ of nodes for a file replication is not a multiple of $k$, i.e.

\begin{displaymath}{n \over k} = m + r \end{displaymath}

where $r < k$, then there is one ``incomplete'' group $\hat{G}$ with $r$ nodes in it. The best way to deal with this group is to arrange it to be a leaf-group in the shortest subtree. Let $G^{\prime} = \{N_1^{\prime}, ..., N_k^{\prime}\}$ be a replication group in the shortest subtree.
Figure 7: Communications between the nodes of regular replication group $G^{\prime }$ and incomplete replication group $\hat{G}$: special step.
\begin{figure}
\centering
\def 1 ...
The communications between groups $G^{\prime }$ and $\hat{G}$ follow a slightly different file exchange protocol. All the nodes in $G^{\prime }$ have already received all subfiles $F_1,
...., F_n$ comprising the entire original file $F$. Each node $N_i^{\prime}$ of group $G^{\prime }$ opens $r$ concurrent network connections to all $r$ nodes of group $\hat{G}$ for transferring its subfile $F_i$ as shown in Figure 7. In this way, at the end of this step, each node of group $\hat{G}$ has all subfiles $F_1, ...., F_k$ of the original file $F$. We will denote this step as a special step. Example. Let $k = 10$. How many algorithm iterations are required to replicate the original file to 1000 nodes? Using Formula (8) above, we derive the following representation for 1000 nodes:

\begin{displaymath}1000 = 10 \times 10^2\end{displaymath}

Thus, in three algorithm iterations ( $10 \times 10 \times 10)$, the original file can be replicated among all 1000 nodes. At each iteration, the replication process follows FastReplica in the small, i.e. the iteration consists of 2 steps, each used for transferring the $1 \over k$-th portion of the original file $F$. Let Multiple Unicast follow a similar recursive replication tree as the one defined above in general FastReplica and shown in Figure 6, with the only difference being that communications between the origin nodes and the recipient nodes follow the Multiple Unicast schema, i.e. the origin node transfers the entire file $F$ to the corresponding recipient nodes by simultaneously using $k$ concurrent network connections. Thus, in three algorithm iterations, by using Multiple Unicast recursively, the original file can be replicated among 1000 nodes.
next up previous
Next: 3.5 Reliable FastReplica Algorithm Up: 3 FastReplica Algorithm Previous: 3.3 Preliminary Performance Analysis