Software and reliability

Next: When is compare-by-hash appropriate? Up: Questions about compare-by-hash Previous: Comparing probabilities

Software and reliability

On a more philosophical note, should software improve on hardware reliability or should programmers accept hardware reliability as an upper bound on total system reliability? What would we think of a file system that had a race condition that was triggered less often than disk I/O errors? What if it lost files only slightly less often than users accidentally deleted them? Once we start playing the game of error relativity, where do we stop? Current software practices suggest that most programmers believe software should improve reliability -- hence we have TCP checksums, asserts for impossible hardware conditions, and handling of I/O errors. For example, the empirically observed rate of undetected errors in TCP packets is about 0.0000005%[13]. We could dramatically improve that rate by sending both the block and its SHA-1 hash, or we could slightly worsen that rate by sending only the hash.

2003-06-16