For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize several speakers. This problem, known as the Cocktail Problem, receives a lot of attention but is still open. Since we work with only one observation, no spatial informations can be used and a modelization of the sources is needed. The second part deals with Musical Processing and is composed of several annexe. The task that we investigate is connected to the Automatic Music Transcription task, which is the process of understanding the content of a song in order to generate a music score. But, music cannot be reduced to a succession of notes, and an accurate transcriptor should be able to detect other performance characteristics such as interpretations effects.
Some contributions to music signal processing and to mono-microphone blind audio source separation
Systèmes de Communication
© TELECOM ParisTech. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
PERMALINK : https://www.eurecom.fr/publication/3272