One of the primary arguments for compare-by-hash is a simple comparison of the probability of a hash collision (very low) and the probability of some common hardware error (also low but much higher). To show that we cannot directly compare the probability of a deterministic, data-dependent error with the probability of nondeterministic, data-independent error, let's construct a hash function that has the same collision probability as SHA-1 but, when used in compare-by-hash, will be a far more common source of error than any normal hardware error.

Define VAL-1(x) as follows:

In other words, VAL-1 is SHA-1 except that the first two inputs map to the same output. This function has an almost identical probability of collision as SHA-1, but it is completely unsuitable for use in compare-by-hash. The point of this example is not that bad hash functions will result in errors, but that we can't directly compare the probability of a hash collision with the probability of a hardware error. If we could, VAL-1 and SHA-1 would be equally good candidates for compare-by-hash. The relationship between the probability of a hash collision and the probability of a hardware error must be more complicated than a straightforward comparison can reveal.