As a result of the growing need for secure systems and services, the design of reliable personal recognition systems is becoming more and more important. In this context, biometric systems which use physiological and/or behavioural traits such as fingerprints, face, iris or voice for automatic recognition of individuals has a number of advantages over conventional authentication methods such as PINs, cards or passports.
In spite of the advantages, however, a growing body of independent work shows that all biometrics systems are vulnerable to subversion, either evasion in the case of surveillance or, as is the interest in this thesis, spoofing in the case of authentication. Surprisingly, there is only a small (but growing) body of work to develop countermeasures which can offer some protection from spoofing attacks.
This thesis presents some of the first solutions to this problem in the case of automatic speaker verification (ASV) systems.
First, the thesis reports an analysis of potential vulnerabilities and introduces an approach to evaluate ASV system performance in the face of spoofing. It presents the first comparison of established attacks (e.g.\ voice conversion and speech synthesis) and introduces a new threat in the form of non-speech signals (e.g.\ artificial signals). Also considered in the difference between spoofing attacks in terms of the effort required for their successful implementation. The thesis reports assessments with a number of ASV systems, from the standard GMM-UBM approach to the state-of-the-art i-vector scheme with PLDA post-processing. Experimental results show that all systems are vulnerable to spoofing. Voice conversion is the most effective attack and provokes increases in false acceptance rates to over 70%.
Second, the thesis presents three new spoofing countermeasures and their integration with state-of-the-art ASV systems. The first countermeasure is based on the detection of repetitive pattern which is effective in detecting artificial signals. The second is based on the analysis of feature dynamics which is effective in detecting converted voices. Like all competing approaches, both of these countermeasures make inappropriate use of prior knowledge. The third countermeasure therefore introduces for the first time the notion of generalized countermeasures, here implemented with one-class classifiers as a solution to outlier detection (unseen attacks). It exploits local binary pattern (LBP) analysis of speech spectrograms for feature extraction and one-class support vector machine (SVM) classifiers. The generalised countermeasure, and therefore the most practically useful, achieves equal error rates (EER) of 5%, 0.1% and 0% in the detection of voice conversion, speech synthesis and artificial signal spoofing attacks respectively.