We remind the reader that large is not sufficient for strong security. For example, even if all features are distinguishing () for all users, but all users' feature descriptors are identical (and the attacker knows this), then an attacker who captures a user's device can trivially determine the key. Therefore, it is equally important that users' feature descriptors vary widely--or more precisely, are drawn from a distribution with high entropy. An entropy evaluation of user's utterances from phone recordings of users saying the same passphrase is described in [16,17], and these studies suggest that the entropy available in user utterances it substantial even when users say the same passphrase. As already noted, however, since that study involves only recordings of users taken over phone lines, and since that study is limited to features, it is insufficient in several ways. Unfortunately, the data sets with which we are presently working (see Sections 5.1 and 5.2) include too few users to enable meaningful measurements of the entropy of users' feature descriptors, and so here we report results for distinguishing features only.
In order to calculate the average number of distinguishing features per user, it is of course necessary to define when a feature is distinguishing. Let and denote the mean and standard deviation of feature over the recent history of successful logins.9 Then we say that the -th feature is distinguishing if for some parameter . Note that if feature is distinguishing, then either and so usually for the user (see (1)), or and so usually for the user. Intuitively, the parameter tunes the ``sensitivity'' of the scheme, in that a small implies more distinguishing features, and a large implies fewer. Obviously must be tuned to balance achieving a high number of distinguishing features with enabling the user to successfully regenerate his key reliably, since a higher number of distinguishing features is advantageous for security but also requires increasingly similar utterances to regenerate the key. The parameter will play a central role in our evaluation.
The features that we use in the balance of this paper are
described in [16, Section 3.2]. Each is defined by
comparing the position of a vector characterizing a segment of the
utterance to a fixed plane. This plane is a parameter of our scheme,
and though we will rarely mention it below, it is important for the
reader to be aware that the data we present is based on a plane
selected, based on our data, to optimize our measures in certain ways.
On the one hand, this means that our data presents what could be
achieved with a good selection of this plane, and is thus optimistic
in this regard. On the other hand, since this plane is selected by
searching through a small set of candidate planes, (infinitely) many
planes are omitted from this search. Consequently, it is likely that
planes yielding better measures exist. The experimentation we have
conducted thus far does not permit us to conclude how to select this
plane in general, and this continues to be an area of our ongoing
work.