Spoofing and anti-spoofing: A shared view of speaker verification, speech synthesis and voice conversion

Wu, Zhizheng; Kinnunen, Tomi; Evans, Nicholas; Yamagishi, Junichi
APSIPA ASC 2015, Asia-Pacific Signal and Information Processing Association - Annual Summit and Conference, Speech Synthesis and Voice Conversion Tutorial, December 16-19, 2015, Hong Kong, Hong Kong

Automatic speaker verification (ASV) offers a low-cost and flexible biometric solution to person authentication. While the reliability of ASV systems is now considered sufficient to support mass-market adoption, there are concerns that the technology is vulnerable to spoofing, also referred to as presentation attacks. Spoofing refers to an attack whereby a fraudster attempts to manipulate a biometric system by masquerading as another, enrolled person. On the other hand, speaker adaptation in speech synthesis and voice conversion techniques attempt to mimic a target speaker's voice automatically, and hence present a genuine threat to ASV systems. 

The research community has responded to speech synthesis and voice conversion spoofing attacks with dedicated countermeasures which aim to detect and deflect such attacks. Even if the literature shows that they can be effective, the problem is far from being solved; ASV systems remain vulnerable to spoofing, and a deeper understanding of speaker verification, speech synthesis and voice conversion will be fundamental to the pursuit of spoofing-robust speaker verification. 

While the level of interest is growing, the level of effort to develop spoofing countermeasures for ASV is lagging behind that for other biometric modalities. What's more, the vulnerabilities of ASV to spoofing are now well acknowledged. A tutorial on spoofing and anti-spoofing from the combined perspective of speaker verification, speech synthesis and voice conversion is much needed. The tutorial will attract, not only members of the growing anti-spoofing research community, but also the broader community of general practitioners in speaker verification, speech synthesis and voice conversion. 

Hong Kong
Digital Security
Eurecom Ref:
© 2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
See also:

PERMALINK : https://www.eurecom.fr/publication/5430