Check out the new USENIX Web site. next up previous
Next: Implementation Up: In-Place Rsync: File Synchronization Previous: Cycle Breaking

Performance

Our experimental results compare ip-rsync to traditional rsync and evaluate different policies for breaking cycles found in ip-rsync graphs. Results indicate that ip-rsync degrades performance in bandwidth constrained environments, which we expect since it always transmits more data over the network. However, when factors other than bandwidth limit performance, our tests show that in-place rsync outperforms traditional rsync. Although ip-rsync requires extra computation, it reduces disk writes and file-system block allocations.

Our experimental data set provides an example of files used on handhelds that need to be updated over wireless networks. The data set consists of 1523 pairs of versioned files obtained from https://www.handhelds.org/. The files include a variety of kernel binaries and compiled programs that are intended to be downloaded to handheld computers. The files in our dataset target the same audience and devices that ip-rsync benefits the most. To collect data, we downloaded the software archive and ran scripts that search the archive for multiple versions of the same files. The original and processed data are available from the Hopkins Storage Systems lab at https://hssl.cs.jhu.edu/ipdata/.

In our experiments, we synchronize each pair of versions with rsync and in-place rsync. For in-place rsync, we compare two different cycle breaking methods: ``delete node'' that deletes the final node found in a cycle and ``trim node'' that trims the least possible number of bytes in each detected cycle.

Ip-rsync incurs some compression loss from encoding overhead. In-place rsync adds four bytes to each twelve byte codeword in order to encode offsets in the target file. These four bytes have a very different effect on compression and on bandwidth overhead. We illustrate this point with a simple worst-case example in which all commands are COPY commands. This results in negligible compression loss: 4 bytes degrade compression by 0.55% in the default 700 byte block. However, the 4 bytes increase the bandwidth by 33% - 16 bytes per COPY codeword as opposed to 12 bytes. The overall bandwidth overhead of in-place rsync in our experiments is less than 5%, much less than the 33% worst case bound. Data transferred in ADD commands dominates bandwidth, which mitigates overhead from codewords.

Overall, ip-rsync achieves compression almost identical to rsync. In addition to encoding overheads, ip-rsync loses compression when eliminating cyclic dependencies. With the delete-node policy, the compression lost by ip-rsync compared to rsync was 0.543%. The trim-node policy cost 0.545% in compression. The difference is negligible. The overall increase in transmitted data averages 0.544% of the size of the original file.

In-place rsync pays for its decreased parallelism and compression with longer transfer times at low bandwidths. For low bandwidths, the in-place algorithm spends 10% more time in the scanning and command transmission steps combined than the original rsync algorithm requires to complete the parallel hash search and command transmission phase. The extra time spent in these steps is fundamental to algorithms that reorder commands and is therefore unavoidable.

Figure 6: The graph shows transfer times of the entire dataset at different bandwidths using rsync and ip-rsync with two cycle breaking heuristics. Ip-rsync performs more slowly for bandwidth-limited transfers. Rsync incurs extra overhead in the ext2 file system when allocating space in the temporary file and, as a result, ip-rsync outperforms rsync at higher bandwidths. The cycle breaking strategies make no noticeable difference in transfer time.
\includegraphics[]{speed.eps}


The performance of ip-rsync scales almost identically to that of rsync (Figure 6). Only at the smallest bandwidths can the effect of transferring the offset with each command be seen. Otherwise, ip-rsync has negligible latency and bandwidth overhead.

Although ip-rsync loses parallelism within a single file, it preserves parallelism across many files. Ip-rsync (and rsync) overlap the transmission of one file with scanning in a subsequent file. Our results show that overlapped execution is the dominant form of parallelism. In one instance of a bandwidth-limited transfer, the entire ip-rsync synchronization took 300 seconds and rsync required 290 seconds. The ip-rsync algorithm transmitted an extra 312,304 bytes, which accounts for the increase in end-to-end time. The standard deviation of the compression loss per file (the average compression loss of is 0.544%) was only 0.8%.

The overhead that arises from decreased parallelism is negligible. Comparing rsync and ip-rsync on a single file would indicate a greater performance loss. However, parallelism lost within a single file is recovered when overlapping multiple files.

While developing ip-rsync, we expected that rsync would outperform ip-rsync in all cases. We believed this because the changes to the algorithm include no improvements in bandwidth, memory usage, or processing. Also, rsync allows for more parallelism between the source and target.

When I/O limits performance (as opposed to bandwidth), ip-rsync outperforms rsync by 2-3% (see Figure 6 at bandwidths greater than 20,000). We account for the decrease in speed by considering the areas where ip-rsync reduces overhead. Ip-rsync performs fewer file-system block allocations using FFS-like file systems [11] that update data in-place. Our tests used the ext2 file system. However, a copy-on-write file system [5,14] would negate these benefits, as every write allocates a new file system block. In our further discussion, we refer to our trials on ext2. By eliminating the need for temporary space, ip-rsync allocates space only as required to increase the file size. In contrast, rsync allocates all blocks in the temporary file, then deallocates the original blocks. The time required to allocate space for a temporary copy for a traditional rsync balances out the decreased parallelism and compression. Furthermore, ip-rsync requires fewer disk writes when files are changed slightly. Ip-rsync can ignore COPY commands which have the same source and destination. Traditional rsync must copy all blocks regardless of whether they remain at the same offset in the file.

Buffering ADD and COPY commands for in-place rsync requires extra memory, but utilizes far less space than that required by rsync. The amount of extra memory scales with file size, but is much smaller than that of a temporary copy. On average, the extra memory needed is only 3.1% of file size, with a standard deviation of 2.9%. The method for cycle breaking by trimming nodes required 3.2% of the file size in data memory, while the delete node algorithm required only 3.0%. The buffering of in-place rsync can be problematic in resource-limited environments. Although we attempt to minimize the space required, it is possible that the algorithm can exceed available memory. We plan to implement the windowing techniques described in section 6.1 to address this problem.


next up previous
Next: Implementation Up: In-Place Rsync: File Synchronization Previous: Cycle Breaking
2003-04-08