Graduate School and Research Center in Digital Sciences

Latent representation learning for artificial bandwidth extension using a conditional variational auto-encoder

Bachhav, Pramod; Todisco, Massimiliano; Evans, Nicholas

ICASSP 2019, International Conference on Acoustics, Speech, and Signal Processing, 12-17 May 2019, Brighton, UK

Artificial bandwidth extension (ABE) algorithms can improve speech quality when wideband devices are used with narrowband devices or infrastructure. Most ABE solutions employ some form of memory, implying high-dimensional feature representations that increase both latency and complexity. Dimensionality reduction techniques have thus been developed to preserve efficiency. These entail the extraction of compact, low-dimensional representations that are then used with a standard regression model to estimate high-band components. Previous work shows that some form of supervision is crucial to the optimisation of dimensionality reduction techniques for ABE. This paper reports the first application of conditional variational auto-encoders (CVAEs) for supervised dimensionality reduction specifically tailored to ABE. CVAEs, form of directed, graphical models, are exploited to model higher-dimensional logspectral data to extract the latent narrowband representations. When compared to results obtained with alternative dimensionality reduction techniques, objective and subjective assessments show that the probabilistic latent representations learned with CVAEs produce bandwidth-extended speech signals of notably better quality.

Document Doi Bibtex

Title:Latent representation learning for artificial bandwidth extension using a conditional variational auto-encoder
Keywords:variational auto-encoder, latent variable, artificial bandwidth extension, dimensionality reduction, speech quality
Type:Conference
Language:English
City:Brighton
Country:UNITED KINGDOM
Date:
Department:Digital Security
Eurecom ref:5817
Copyright: © 2019 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Bibtex: @inproceedings{EURECOM+5817, doi = {http://dx.doi.org/10.1109/ICASSP.2019.8683611}, year = {2019}, title = {{L}atent representation learning for artificial bandwidth extension using a conditional variational auto-encoder}, author = {{B}achhav, {P}ramod and {T}odisco, {M}assimiliano and {E}vans, {N}icholas}, booktitle = {{ICASSP} 2019, {I}nternational {C}onference on {A}coustics, {S}peech, and {S}ignal {P}rocessing, 12-17 {M}ay 2019, {B}righton, {UK}}, address = {{B}righton, {UNITED} {KINGDOM}}, month = {05}, url = {http://www.eurecom.fr/publication/5817} }
See also: