Comparing probabilities

Next: Software and reliability Up: Questions about compare-by-hash Previous: Silent, deterministic, hard-to-fix errors

Comparing probabilities

One of the primary arguments for compare-by-hash is a simple comparison of the probability of a hash collision (very low) and the probability of some common hardware error (also low but much higher). To show that we cannot directly compare the probability of a deterministic, data-dependent error with the probability of nondeterministic, data-independent error, let's construct a hash function that has the same collision probability as SHA-1 but, when used in compare-by-hash, will be a far more common source of error than any normal hardware error.

Define VAL-1(x) as follows:

$\begin{displaymath}\mbox{VAL-1}(x) = \left\{ \begin{array}{r@{\quad:\quad}l} x >... ...{SHA-1}(x) \\ x = 0 & \mbox{SHA-1}(1) \\ \end{array} \right. \end{displaymath}$

In other words, VAL-1 is SHA-1 except that the first two inputs map to the same output. This function has an almost identical probability of collision as SHA-1, but it is completely unsuitable for use in compare-by-hash. The point of this example is not that bad hash functions will result in errors, but that we can't directly compare the probability of a hash collision with the probability of a hardware error. If we could, VAL-1 and SHA-1 would be equally good candidates for compare-by-hash. The relationship between the probability of a hash collision and the probability of a hardware error must be more complicated than a straightforward comparison can reveal.

Next: Software and reliability Up: Questions about compare-by-hash Previous: Silent, deterministic, hard-to-fix errors

2003-06-16