DISTBIC : A speaker-based segmentation for audio data indexing
Speech Communication, Volume 32, N°1-2, 2000
In this paper, we address the problem of speaker-based segmentation, which is the ®rst necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither speaker model, nor speech model). However, we assume that people do not speak simultaneously and that we have no real-time constraints. We review existing techniques and propose a new segmentation method, which combines two different segmentation techniques. This method, called DISTBIC, is organized into two passes: First the most likely speaker turns are detected, and then they are validated or discarded. The advantage of our algorithm is its e†ciency in detecting speaker turns even close to one another (i.e., separated by a few seconds).
| Type: | Journal |
| Language: | English |
| City: | |
| Date: | September 2000 |
| Department: | Multimedia Communications |
| Eurecom ref: | 564 |
| Copyright: | © Elsevier. Personal use of this material is permitted. The definitive version of this paper was published in Speech Communication, Volume 32, N°1-2, 2000 and is available at : http://dx.doi.org/10.1016/S0167-6393(00)00027-3 |
| Bibtex: | @article{EURECOM+564, doi = {http://dx.doi.org/10.1016/S0167-6393(00)00027-3}, year = {2000}, month = {09}, title = {{DISTBIC} : {A} speaker-based segmentation for audio data indexing}, author = {{D}elacourt, {P}errine and {W}ellekens, {C}hristian {J}}, journal = {{S}peech {C}ommunication, {V}olume 32, {N}°1-2, 2000}, url = {http://www.eurecom.fr/publication/564} } |
| See also: |
|
Permalink: http://www.eurecom.fr/publication/564


