In this paper, we present the architecture of our speaker-based indexing system. The goal is to recognize from their voice the sequence of people engaged in a conversation. In our context, we make no assumptions about prior knowledge of the speaker characteristics (no speaker model, no speech model, no training phase). And the number of speakers is unknown. However, we assume that people do not speak simultaneously. For each stage of our speaker-based indexing system, we detail the constraints and propose or review some techniques according to these constraints. Finally, evaluation methods for each stage are examined.
A first step into speaker-based indexing
CBMI 1999, 1st European Workshop on Content-Based Multimedia Indexing, October 25-27 1999, Toulouse, France
© 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/254