Most ABE algorithms exploit contextual information or memory captured via the use of static or dynamic features extracted from neighbouring speech frames. The use of memory leads to higher dimensional features and increased computational complexity. When information from look-ahead frames is also utilised, then latency also increases. Past work points toward the benefit to ABE of exploiting memory in the form of dynamic features with a standard regression model. Even so, the literature is missing a quantitative analysis of the relative benefit of explicit memory inclusion. The research presented in this thesis assesses the degree to which explicit memory is of benefit and furthermore reports a number of different techniques that allow for its inclusion without significant increases to latency and computational complexity. Benefits are shown through both a quantitative analysis with an information-theoretic measure and subjective listening tests. Key contributions relate to the preservation of computational efficiency through the use of dimensionality reduction in the form of principal component analysis, semisupervised stacked autoencoders and conditional variational auto-encoders. The two latter techniques optimise dimensionality reduction to deliver superior ABE performance.
Explicit memory inclusion for efficient artificial bandwidth extension
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
PERMALINK : https://www.eurecom.fr/publication/5971