Indexing of video soundtracks is an important issue for the navigation in multimedia databases. Based on wordspotting techniques, it should meet very constraining specifications; namely fast response to queries, concise processed speech information for limiting the storage memory, speaker independant mode, easy characterization of any word by its phonemic spelling. A solution based on phonemic lattices and on a division of the indexing process into an off-line and an online part is proposed. Previous works based on frame labelling and maximum likelihood criterion are now modified to take into account this new approach based on a maximum a posteriori (MAP) criterion. The REMAP algorithm implements this MAP criterion for training. It has several advantages such as maximizing the global discriminant criterion, avoiding the difficult problem of phoneme transition detection during the training process and being well suited for a hybrid hidden Markov model (HMM) and neural network (NN) approach.
REMAP for video soundtrack indexing
ICASSP 1997, 22nd IEEE International conference on acoustics, speech, and signal processing, April 21-24 1997, Munich, Germany
© 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/554