There are other possible solutions to the problem that we are addressing. We are going to discuss four of them.
Delta shipping. The idea is to ship only the incremental difference, which is also called the delta, between different versions of a file. It has been proposed by many people and is currently being used as a general mechanism [24] or in specific systems including file systems [5], web proxies [12], file archives [11], and source-file repositories [22,16].
It is possible to compute deltas not only for text files but also for binary files. We would like to mention the rsync algorithm [24] in particular. When shipping a file, the sending host suppresses the shipping of some blocks of data if they are found to be present on the receiving host already. It determines whether they are already present on the receiving host by using the checksum information supplied by the receiving host. The algorithm exploits a rolling checksum algorithm so that the blocks being matched can be started at any offset, not just multiples of block size.
Delta shipping has several limitations. First, a newly-created file has no previous version. Second, the effectiveness of delta shipping largely depends on how similar the two versions of a file are, and how those incremental differences are distributed in the file. In pathological case, a slightly changed file may need a huge delta. This could happen, for example, if there are some global substitutions of strings, or if the brightness or contrast of an image is changed. In general, we believe operation shipping can achieve a larger reduction of network traffic.
On the other hand, delta shipping does not involve re-execution of applications and pre-arrangement of surrogate clients, as operation shipping does. Therefore, it is simpler in terms of system administration. We believe delta shipping and operation shipping can complement each other in a distributed file system. In particular, delta shipping can be used when operation shipping has failed for some updates, and when the file system has resorted to use value shipping.
Data compression. Data compression reduces the size of a file by taking out the redundancy in the file. This technique can be used in a file system or a web proxy [12]. However, the reduction factors achieved by data compression may be smaller than that of operation shipping. We did a small performance study using a representative implementation: the gzip utility, which uses the Lempel-Ziv coding (LZ77). We ran gzip with the updated files of the 16 tests in Section 4, and listed the expected traffic volume and expected traffic reduction by compressing the files before shipping them. The results are shown in Figure 4 (the sixth and seventh column). The expected traffic reductions by data compression ranged from 2.7 to 8.1, substantially smaller than that achieved by operation shipping, which ranged from 12.0 to 245.7. We were not surprised by the results, since we know operation shipping exploits the semantic information of the user operation, whereas data compression operates only generically on the files. Like delta shipping, data compression can complement operation shipping, and be used when our file system has resorted to value shipping.
Logging keystrokes. A file system may log keystrokes and mouse clicks, ship them, and replay them on the surrogate. As such, it may be transparent to an application even if the application is interactive. However, we are pessimistic about this approach, because it is very difficult to make sure the logged keystrokes and mouse clicks will produce the identical outcome on the surrogate machine. Too many things can happen at run-time that could cause the keystrokes to produce different results.
Operation shipping without involving the file system. Can we use operation shipping without involving the file system? We can imagine that someone may design a meta-application that logs every command a user types, and, without involving the file system, remotely executes the same commands on a surrogate machine. We believe such a system would not work, for the following reasons. If the file system had no knowledge that the second execution was a re-execution, it would treat the files produced by the two execution as two distinct copies, and would force the client to fetch the surrogate copy. It might even think that there was an update/update conflict. Besides, it cannot ensure the correctness of the re-execution. We therefore believe that the file system plays a key role in useful and correct operation shipping,