Ideally, one would like any password handling algorithm to be a strong one-way function of the password--that is, given the algorithm's output and other inputs, an attacker should have little chance of learning even partial information she could not already have guessed about the password. Unfortunately, one-way functions are defined asymptotically with respect to their input lengths--an attacker has negligible probability of inverting a one-way function on sufficiently large inputs, but exactly how large depends on the attacker. Because there is a fixed limit to the size of passwords users will tolerate, we need a different criterion for functions on passwords.
Informally, we would like a password scheme to be ``as good as the passwords users choose.'' Given a probability distribution D on passwords, we define the predictability R(D) of the distribution to be the highest probability of any single password s in D: . A function of a password is secure if an attacker's probability of learning any partial information about the password is proportional to the product of the work she invests and the predictability of the password distribution.
What does it mean for an attacker to learn partial information about a password? We define partial information to be the value of any single-bit predicate on a password. Interesting predicates on passwords might include the first bit of a password, or the parity of bits in a password. An attacker can always guess certain predicates with high probability--for instance, the trivial predicate P(s)=1 which returns 1 on all passwords. If a function of a password is secure, however, its output should not let an attacker guess any predicate more accurately than she could have without the function's output.
More formally, let F(s,t) be a function. The argument s represents a user's secret password, which will be drawn from a probability distribution D. The argument t represents any additional non-secret inputs F might take. Let the values of t be drawn from a probability distribution T. We model an attacker as a randomized boolean circuit, A, that tries to guess a predicate P about a password. The cost of an attack--or the work invested by an attacker--is the number of gates in the circuit, which we denote |A|. We use the notation to denote the probability of statement B after an experiment in which variables are drawn from probability distributions , respectively. Now we can define what it means for a password function to resist attack. We say that function F(s,t) is an -secure password function if the following hold:
We should first note that this definition matches our intuition about a password hashing function like crypt. If users choose predictable enough passwords, knowing a password hash gives adversaries a large advantage--they can compare hashes of the most popular passwords to that of the password they are trying to break. If, additionally, one can guess a useful predicate without even looking at a password hash--for instance by knowing that the third character of most passwords is a lower-case letter--then clearly an adversary can guess this too.
If, however, no single password occurs with particularly high probability, an adversary should need to expend a large amount of effort (as measured in circuit gates) to discover any non-trivial information about a password. Finally, we also wish to prevent an attacker from finding other strings that hash to the same value as a password; such strings may prove equivalent to passwords during authentication. The requirement of second preimage resistance guarantees such collisions are hard to find, even with knowledge of the original password. It also ensures that F does not ignore any bits of its password input.
The definition implies that a secure password function F(s,t) must make non-trivial use of its second argument, t. To see this, consider that the first bit of F(s,0) is a perfectly valid predicate on passwords. An attacker could easily guess this predicate if either F ignored its second argument or the string 0 occurred in T with high probability. This point is not merely an academic one. A single-input password hashing function F(s) can be inverted by a circuit large enough to encode a lookup table mapping F(s) (or sufficiently many bits of F(s)) to s. The size of such a circuit depends only on the probability distribution of the passwords, not on the particulars of F.
As proposed by Morris and Thompson [9], however, lookup tables can be thwarted with the second input to F, which they call a salt. If a random salt is chosen whenever users establish new passwords, and if the salt space is large enough to ensure a negligible probability of recurrence, lookup tables offer an adversary no advantage; he may as well compute F at the time of attack. If, on the other hand, the salt space is too small, the output bits of F become useful predicates on passwords, a fact exploited by the QCrack [12] program described in Section 6.
While salted passwords defeat lookup tables, given a particular salt and hash, an adversary can still mount a brute force attack by evaluating F(s,t) on every possible password. It follows that the best security one can achieve is , where |F| is the cost in gates of implementing F. Usability requirements therefore effect a lower limit on --if people can only tolerate a one second delay for checking passwords, F can take at most one second to evaluate. F should not take significantly less, however, as this would unnecessarily weaken security.
The number of gates |A| that an adversary can reasonably muster for an attack increases constantly as hardware improves. Fortunately, so does the speed of machines that must legitimately evaluate F. That means passwords should not be hashed by a single function F with fixed computational cost, but rather by one of a family of functions with arbitrarily high cost. Instead of repeatedly throwing out functions like crypt and MD5 crypt to start over with more expensive but incompatible ones, systems should allow the cost of any password manipulation software to scale gracefully with a tunable parameter. Thus, can decrease as fast as hardware improves and users will tolerate. Compromised password databases will then enjoy maximum security against off-line attacks.
In summary, a good password function makes extracting any partial information about passwords as difficult as guessing passwords. A concrete parameter, , should characterize this difficulty. To achieve low values of , a password function must take a second input, the salt, that prevents adversaries from benefiting from large lookup tables. The best value of is inversely proportional to the cost of evaluating a password function. This establishes a lower limit for based on the maximum tolerable cost of evaluating F during legitimate use. As hardware speeds constantly improve, a good password scheme should allow the cost of F to increase gradually so that can decrease over time.
One final criterion for a good password function is then to minimize the value . That means one should make any password function as efficient as possible for the setting in which it will operate. The designers of crypt failed to do this. They based crypt on DES [10], a particularly inefficient algorithm to implement in software because of many bit transpositions. They discounted hardware attacks, in part because crypt cannot be calculated with stock DES hardware. Unfortunately, Biham [4] later discovered a software technique known as bitslicing that eliminates the cost of bit transpositions in computing many simultaneous DES encryptions. While bitslicing won't help anyone log in faster, it offers a staggering speedup to brute force password searches.
In general, a password algorithm, whatever its cost, should execute with near optimal efficiency in any setting in which it sees legitimate use, while offering little opportunity for speedup in other contexts. It should rely heavily on a CPU's fast instructions--for instance addition, bitwise exclusive-or, shifts, and memory access to state that fits in a processor's first level cache. Ideally these operations should all be portably accessible from high-level languages like C, so as to minimize the benefit of hand-coded assembly language implementations. Conversely, the algorithm should avoid operations like bit transposition on which customized hardware enjoys a large advantage.
A password function should also not lend itself to any kind of pipelined hardware implementation. It should permit relatively little speedup from any kind of precomputation--for instance, hashing 1,000 passwords with the same salt and hashing one password under 1,000 salts should each cost 1,000 times more than hashing a single password.