Audio-visual intent-to-speak detection for human-computer interaction

de Cuetos, Philippe;Chalapathy Neti, Andrew W Senior
ICASSP 2000, 25th IEEE International conference on acoustic, speech and signal, June 5-9, 2000, Istanbul, Turkey

This paper introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity aswell as audio energy to determine if the previously detected user is actually speaking or not.


DOI
Type:
Conférence
City:
Istanbul
Date:
2000-06-09
Department:
Sécurité numérique
Eurecom Ref:
413
Copyright:
© 2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
See also:

PERMALINK : https://www.eurecom.fr/publication/413