End-to-end automatic speaker verification with evolving recurrent neural networks

Valenti, Giacomo; Daniel, Adrien; Evans, Nicholas
ODYSSEY 2018, The Speaker and Language Recognition Workshop, June 26-29, 2018, Les Sables d'Olonne, France

The state-of-the-art in automatic speaker verification (ASV) is undergoing a shift from a reliance on hand-crafted features and sequentially optimized toolchains towards end-to-end approaches. Many of the latest algorithms still rely on frameblocking
and stacked, hand-crafted features and fixed model topologies such as layered, deep neural networks. This paper reports a fundamentally different exploratory approach which operates on raw audio and which evolves both the weights and the topology of a neural network solution. The paper reports what is, to the authors' best knowledge, the first investigation of evolving recurrent neural networks for truly end-to-end ASV.
The algorithm avoids a reliance upon hand-crafted features and fixed topologies and also learns to discard unreliable output samples. Resulting networks are of low complexity and memory footprint. The approach is thus well suited to embedded systems. With computational complexity making experimentation with standard datasets impracticable, the paper reports modest proof-of-concept experiments designed to evaluate potential.
Results equivalent to those obtained using a traditional GMM baseline system and suggest that the proposed end-toend approach merits further investigation; avenues for future
research are described and have potential to deliver significant improvements in performance.

DOI
Type:
Conference
City:
Les Sables d'Olonne
Date:
2018-06-26
Department:
Digital Security
Eurecom Ref:
5540
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in ODYSSEY 2018, The Speaker and Language Recognition Workshop, June 26-29, 2018, Les Sables d'Olonne, France and is available at : http://dx.doi.org/10.21437/Odyssey.2018-47

PERMALINK : https://www.eurecom.fr/publication/5540