Blind audio source separation using short+long term AR source models and iterative itakura-saito distance minimization

Schutz, Antony; Slock, Dirk T M
IWAENC 2010, International Workshop on Acoustic Echo and Noise Control, August 30-September 2nd, 2010, Tel Aviv, Israel

 

 

 

 

 

 

 

 

 

 

Blind audio source separation (BASS) arises in a

 

 

 

number of applications in speech and music processing such

 

 

 

as speech enhancement, speaker diarization, automated music

 

 

 

transcription etc. Generally, BASS methods consider multichannel

 

 

 

signal capture. The single microphone case is the most

 

 

 

difficult underdetermined case, but it often arises in practice.

 

 

 

In the approach considered here, the main source identifiability

 

 

 

comes from exploiting the presumed quasi-periodic nature of

 

 

 

sources via long-term autoregressive (AR) modeling. Indeed,

 

 

 

musical note signals are quasi-periodic and so is voiced speech,

 

 

 

which constitutes the most energetic part of speech signals. We

 

 

 

furthermore exploit (e.g. speaker or instrument related) prior

 

 

 

information in the spectral envelope of the source signals via

 

 

 

short-term AR modeling. We present an iterative method based

 

 

 

on the minimization of the Itakura-Saito distance for estimating

 

 

 

the sources parameters directly from the mixture using a frame

 

 

 

based analysis.


Type:
Conference
City:
Tel Aviv
Date:
2010-08-30
Department:
Communication systems
Eurecom Ref:
3187
Copyright:
Copyright VDE Verlag. Personal use of this material is permitted. The definitive version of this paper was published in IWAENC 2010, International Workshop on Acoustic Echo and Noise Control, August 30-September 2nd, 2010, Tel Aviv, Israel and is available at :

PERMALINK : https://www.eurecom.fr/publication/3187