Measures

Next: Empirical results Up: Security evaluation Previous: Password distribution

Measures

We are primarily concerned with measuring the ability of an attacker to guess the password of a user. Given accurate values for $\Pr\left[{p}^{(k)} \leftarrow {\cal S}\right]$ for each ${p}^{(k)}$ , a measure that indicates this ability is the ``guessing entropy'' [18] of passwords. Informally, guessing entropy measures the expected number of guesses an attacker with perfect knowledge of the probability distribution on passwords would need in order to guess a password chosen from that distribution. If we enumerate passwords ${p_1}^{(k)}$ , ${p_2}^{(k)}$ , $\ldots$ in non-increasing order of $\Pr\left[{p_i}^{(k)} \leftarrow {\cal S}\right]$ , then the guessing entropy is simply

$\begin{displaymath} \sum_{i > 0} i \cdot \Pr\left[{p_i}^{(k)} \leftarrow {\cal S}\right] \end{displaymath}$

(6)

Guessing entropy is closely related to Shannon entropy, and relations between the two are known.⁴ Since guessing entropy intuitively corresponds more closely to the attacker's task in which we are interested (guessing a password), we will mainly consider measures motivated by the guessing entropy.

The direct use of (6) to compute guessing entropy using the probabilities in (5) is problematic for two reasons. First, an attacker guessing passwords will be offered additional information when performing a guess, such as the set of available categories from which the next image can be chosen. For example, in Face, each image choice is taken from nine images that represent nine categories of images, chosen uniformly at random from the twelve categories. This additional information constrains the set of possible passwords, and the attacker would have this information when performing a guess in many scenarios. Second, we have found that the absolute probabilities yielded by (5) can be somewhat sensitive to the choice of $\lambda_{\hat{\ell}}$ , which introduces uncertainty into calculations that utilize these probabilities numerically.

**Figure 3:** Measures versus $\lambda _0$ for Face
$\begin{figure}\centerline{ \epsfig{figure=face-unigram-entropy.eps,width=3in,clip=} }\end{figure}$

**Figure 4:** Measures versus $\lambda _0$ for Story
$\begin{figure}\centerline{\epsfig{figure=story-unigram-entropy.eps,width=3in,clip=}}\end{figure}$

To account for the second of these issues, we use the probabilities computed with (5) only to determine an enumeration $\Pi = ({p_1}^{(k)}, {p_2}^{(k)}, \ldots)$ of passwords in non-increasing order of probability (as computed with (5)). This enumeration is far less sensitive to variations in $\lambda_{\hat{\ell}}$ than the numeric probabilities are, and so we believe this to be a more robust use of (5). We use this sequence to conduct tests with our dataset in which we randomly select a small set of ``test'' passwords from our dataset (20% of the dataset), and use the remainder of the data to compute the enumeration $\Pi$ .

We then guess passwords in order of $\Pi$ until each test password is guessed. To account for the first issue identified above, namely the set of available categories during password selection, we first filter from $\Pi$ the passwords that would have been invalid given the available categories when the test password was chosen, and obviously do not guess them. By repeating this test with non-overlapping test sets of passwords, we obtain a number of guesses per test password. We use $G_{{\cal S}}^{\rm avg}$ to denote the average over all test passwords, and $G_{{\cal S}}^{\rm med}$ to denote the median over all test passwords. Finally, we use $G_{{\cal S}}^x$ for $0 < x \le 100$ to denote the number of guesses sufficient to guess percent of the test passwords. For example, if $25\%$ of the test passwords could be guessed in or fewer guesses, then $G_{{\cal S}}^{25} = 6$ .

We emphasize that by computing our measures in this fashion, they are intrinsically conservative given our dataset. That is, an attacker who was given 80% of our dataset and challenged to guess the remaining 20% would do at least as well as our measures suggest.

Next: Empirical results Up: Security evaluation Previous: Password distribution