Check out the new USENIX Web site. next up previous
Next: Algorithm Up: In-Place Rsync: File Synchronization Previous: Background

Design

Rsync synchronizes two files, bringing an old version of a file on the target up to date with a new version of a file on the source. Rsync sends as few bytes as possible by detecting common sections of the two versions and using the common data in the old version when building the new version. To detect the common sections, the target generates a weak and strong checksum for blocks in the target file. The target transfers the checksums to the source. On the source, rsync stores the weak checksums in a hash table. The checksums take approximately 100 times less space and bandwidth than the file. The source file is scanned, calculating the weak checksum at each offset. Rsync probes the hash table with each checksum. Upon finding a matching checksum, a strong checksum is computed and compared. Matching strong checksums indicate matching data with high probability. Rsync encodes matched data as a COPY command. The COPY encodes an action on the target that duplicates data from the reference file in the updated file. The algorithm encodes unmatched data as an ADD command, which includes the data to be added. The target receives encodings from the source. Encodings describe the new file sequentially (from first byte to last) so that the target can reconstruct the new version in a single pass. The use of a weak and strong checksum saves computation by allowing the source to generate and compute a strong checksum only when the weak checksum already matches. Also, the chosen weak checksum saves computation by rolling from offset $n$ to offset $n+1$ without recomputing the checksum based on all $k$ bytes [7]. Rolling checksums observe that strings of length $k$ at offsets $n$ and $n+1$ differ in only two bytes - the first byte of the string starting at $n$ and the last byte of the string starting at $n+1$. The algorithm computes a rolling checksum for offset $n+1$ by subtracting the contribution of the first byte of the checksum at offset $n$ and adding the contribution of the last byte of checksum at offset $n+1$.

Figure 2: The rsync process triangle with the sequence of steps necessary for an ip-rsync transfer.
\begin{figure}\centerline{%%
\input{proctri.pstex_t}}\end{figure}



Subsections
next up previous
Next: Algorithm Up: In-Place Rsync: File Synchronization Previous: Background
2003-04-08