Sound Event Detection and Classification in Everyday Environments

Tuomas VIRTANEN - Professor at Laboratory of Signal Processing, Tampere University of Technology (TUT), Finland
Digital Security

Date: -
Location: Eurecom

Abstract: Sound scenes in our everyday environments such as home, street, office, car, and grocery store convey lots of information about events taking place in it - for example car passing by, somebody knocking to a door, dog barking, etc. Computational analysis of these, i.e., machine listening, has lots of of applications for example in context-aware devices, acoustic surveillance, and multimedia indexing. Analysis of sounds in realistic everyday environments is challenging because of diversity of acoustics of natural sounds, multiple sources being present simultaneously, and reverberation. In this talk we present a generic machine listening approach for analyzing everyday soundscapes by detecting and classifying sound events in it. We present state-of-the-art methodology based on various kinds of deep neural network architectures. We present how overlapping sounds can be recognized with multilabel neural networks. We present how temporal dynamics of natural sounds can be modeled efficiently with recurrent neural networks, and relevant acoustic features learned automatically with convolutional neural networks. We present advanced training procedures based on transfer learning which allows a sound event detection system to be adapted to recognize new sound classes. We also present shortly the results of the recent DCASE 2017 public evaluation challenge. Audio and video demonstrations will be given. BIO: Tuomas Virtanen is Professor at Laboratory of Signal Processing, Tampere University of Technology (TUT), Finland, where he is leading the Audio Research Group. He received the M.Sc. and Doctor of Science degrees in information technology from TUT in 2001 and 2006, respectively. He has also been working as a research associate at Cambridge University Engineering Department, UK. He is known for his pioneering work on single-channel sound source separation using non-negative matrix factorization based techniques, and their application to noise-robust speech recognition and music content analysis. Recently he has done significant contributions to sound event detection in everyday environments. In addition to the above topics, his research interests include content analysis of audio signals in general and machine learning. He has authored more than 100 scientific publications on the above topics, which have been cited more than 5000 times. He has received the IEEE Signal Processing Society 2012 best paper award for his article "Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria" as well as three other best paper awards. He is an IEEE Senior Member, member of the Audio and Acoustic Signal Processing Technical Committee of IEEE Signal Processing Society, Associate Editor of IEEE/ACM Transaction on Audio, Speech, and Language Processing, and recipient of the ERC 2014 Starting Grant.