Next: Conclusion Up: Toward Speech-Generated Cryptographic Keys Previous: Evaluation of Speaker-J recordings

Related Work

The only prior work of which we are aware on the topic of generating cryptographic keys from voice utterances is that in which the cryptographic key is merely the text of what is spoken, recognized using standard techniques in automatic speech recognition. (Actually, we are unaware of specific systems that even just do this, but since it is an immediate extension of using a typed password as a cryptographic key, we treat it as obvious prior work.) To our knowledge, our work is the only research toward determining a repeatable cryptographic key that draws entropy from how the user speaks the passphrase. How this paper contributes over our own prior publications on this topic was described in Section 2.

That said, there has been work on generating cryptographic keys from biometrics other than voice. The first such work of which we are aware is due to Soutar et al. [25,26,27], who describe methods for generating a repeatable cryptographic key from a fingerprint using optical computing and image processing techniques. These techniques generate a key from a two-dimensional image (a fingerprint being the obvious example), but do not seem to be well-suited to the task we pursue here. Solutions based on this technology are marketed by Bioscript (see https://www.bioscrypt.com/).

A different approach to generating a repeatable key based on biometric data is due to Davida, Frankel, and Matt [4]. In this scheme, a user carries a portable storage device containing (i) error correcting parameters to decode readings of the biometric (e.g., an iris scan) with a limited number of errors to a ``canonical'' reading for that user, and (ii) a one-way hash of that canonical reading for verification purposes. This canonical reading, once generated, can be used as a cryptographic key, or can be hashed together with a password (using a different hash function) to obtain a key. Juels and Wattenberg [9] generalized and improved the Davida et al. scheme through a novel modification in the use of error-correcting codes, thereby shrinking the code size and achieving higher resilience. These techniques are a different approach for generating cryptographic keys from biometric readings, and reach a correspondingly different set of tradeoffs. Notably, whereas our techniques permit a user to reconstruct her key even if she is inconsistent on a majority of her feature descriptor bits (not uncommon when using voice as a biometric [6]), these techniques do not.

More distantly related work is that of Ellison et al. for generating a cryptographic key based on answers to questions posed to a user [7]. The work is premised on the assumption that questions can be posed that the legitimate user will answer one way but others attempting to impersonate the user will answer another way. Their construction resembles one instance of our techniques, namely that of [15, Sections 5.1-5.2], and in this way their scheme achieves a degree of resilience to forgotten answers. However, Bleichenbacher and Nguyen [2] have shown that the Ellison et al. scheme is insecure, whereas our constructions appear to be much stronger. Another construction similar to that in [15, Sections 5.1-5.2] was used in the design of a forensic database, where a person's medical record can be decrypted only once a DNA sample of the person is obtained (e.g., at a crime scene) [3]. However, this scheme is also insufficient for our purposes, due to the same inadequacies as the scheme of [15, Sections 5.1-5.2].

Impersonation attacks using recordings of the user speaking phrases other than her passphrase, as we explored in Section 5.2, have been previously studied for the purpose of fooling speaker verification systems (e.g., [8,18,14,23]). The approach taken in these works are somewhat different from our exploration here, however. Notably, in [14,23], the authors describe synthesizing a passphrase using a speaker-independent model, and then adapting the pitch and duration of the synthesized passphrase based on relatively few recordings of the user. The authors give evidence that even these simple attacks can make it difficult to set acceptance thresholds for a speaker verification system. In future work we hope to explore how these techniques can be applied in the context of our work.

Next: Conclusion Up: Toward Speech-Generated Cryptographic Keys Previous: Evaluation of Speaker-J recordings

fabian 2002-08-28