Check out the new USENIX Web site. next up previous
Next: Front-end Signal Processing Up: Toward Speech-Generated Cryptographic Keys Previous: Introduction


Background

In this section we give the background necessary to describe our contributions in this paper. Our general approach to generating a cryptographic key from a spoken utterance of a passphrase is described in [16] and is based on an approach initially developed for generating a key from a typed password and the user's keystroke timings during the entry of that password [15]. How this paper relates to these prior publications is described below. In our approach, the system is initialized by first generating a cryptographic key $K$, and then generating a collection of $2m$ shares of $K$ using a generalized secret sharing scheme (e.g., see [24] for a survey of such schemes), where $m$ is a system parameter. These shares are aligned in a $m \times 2$ table that is stored on stable storage. The shares must be generated and placed within the table so that $K$ can be reconstructed from any set of $m$ shares consisting of one share from each row of the table.

Upon entry of the passphrase, the system measures $m$ biometric features of the user's entry of the passphrase. We denote the $i$-th feature by $\phi_i$, and denote the value of feature $\phi_i$ on the $\ell$-th (successful or unsuccessful) attempt to log into this user's account (i.e., generate this user's key) by $\phi_i(\ell)$. In the case of a spoken passphrase, the features $\phi_i$ are features extracted from the user's utterance as described in [16]. For the $\ell$-th login attempt, the system then generates a bit string $b_\ell \in \{0,1\}^m$ from $\phi_0(\ell), \ldots,
\phi_{m-1}(\ell)$; $b_\ell$ is called a feature descriptor, and the $i$-th bit of $b_\ell$, denoted $b_\ell(i)$, is determined from the $i$-th feature $\phi_i(\ell)$. Algorithms for generating $b_\ell$ from $\phi_0(\ell), \ldots,
\phi_{m-1}(\ell)$ are proposed and evaluated in [16], but for the purposes of this paper, the reader can think of $b_\ell$ being determined by

\begin{displaymath}
b_\ell(i) \leftarrow \left\{
\begin{array}{ll}
0 & \mbo...
...\tau_i$} \\
1 & \mbox{otherwise}
\end{array}
\right.
\end{displaymath} (1)

where $\tau_i$ denotes some fixed threshold value. The system then attempts to reconstruct $K$ using the table elements at positions $\langle
i, b_\ell(i)\rangle _{0 \le i < m}$. When the login index $\ell$ is not necessary in the discussion, we will typically omit it.

After a history of feature descriptors from successful logins is observed (i.e., logins in which the key was successfully reconstructed), those elements of the table that are not typically accessed by the user are perturbed randomly. So, for example, if the feature descriptors $b$ induced by a user's biometric measurements are such that $b(4) = 1$ consistently, then the $\langle 4,0\rangle $ element of the table is randomly altered. If $b(i)$ is sufficiently consistent that element $\langle i, 1-b(i)\rangle $ in the table is perturbed in this way, then $b(i)$ is called a distinguishing feature. We will give a precise characterization of distinguishing features in Section 5.

The correct user, when inducing feature descriptors consistent with those she has induced in the past, should not encounter any of the altered elements in the table. We have found, however, that in practice it is necessary for the system to attempt reconstruction using feature descriptors within some Hamming distance $c_{\rm max}$ of the induced feature descriptor, to correct for up to $c_{\rm max}$ ``errors'' by the user (e.g., see [15]). This results in up to

\begin{displaymath}
\sum_{i = 0}^{c_{\rm max}} {m \choose i}
\end{displaymath} (2)

secret sharing reconstructions per key (re)generation attempt. Security of this technique requires that an adversary who captures the device be unable to efficiently differentiate a random table element from a valid share of $K$. If this is the case, then an adversary may be forced to simply guess feature descriptors until he finds $K$. If $d$ table elements were randomized (i.e., there are $d$ distinguishing features), then $2^{m-d}$ out of the $2^m$ possible feature descriptors yield $K$, and so the probability that a randomly chosen descriptor will reconstruct the secret is $2^{-d}$.

Specific secret sharing schemes for populating this table were investigated in [15]. That paper also included an evaluation of this approach with feature descriptors of length $m=15$ derived from the keystroke timings of a user while typing an $8$-character password. There, we evaluated an implementation in which the table was additionally encrypted with the password; in this way, the technique serves to render a dictionary attack against the password up to $2^{15}$ times more difficult. Our subsequent work on voice features [16] described algorithms for generating feature descriptors from the user's voice while speaking a passphrase. It further evaluated the security and reliability of the resulting technique with feature descriptors of length $m=46$ derived from preexisting recordings of users over a phone line. However, in contrast to the keystroke case, here our evaluation presumed a table that was not encrypted with the passphrase, in order to avoid the costs of automatically recognizing the spoken passphrase (to decrypt the table). In this case, $m=46$ does not provide nearly enough security for important applications.

In this paper we address the computational challenges of performing key reconstruction on a resource-constrained PDA with more realistic parameters than our previous voice study explored. Specifically, we evaluate our implementation of this approach for feature descriptors of length $m=60$, and argue that regenerating the key $K$ can be reliably achieved on a $206$ MHz StrongARM processor by correcting for up to $c_{\rm max} \approx 5$ errors (in the sense described above). The challenges in achieving this are the front-end signal processing needed to keep $c_{\rm max}$ small so that expression (2) remains manageable, and in devising a secret sharing scheme and corresponding reconstruction algorithm that permits this reconstruction to occur in a reasonable amount of time on this platform. Consequently, we focus on these contributions in this paper, and refer the reader to [16] for the algorithmic details comprising other steps of the key (re)generation process.


next up previous
Next: Front-end Signal Processing Up: Toward Speech-Generated Cryptographic Keys Previous: Introduction
fabian 2002-08-28