Ecole d'ingénieur et centre de recherche en Sciences du numérique

Audio-based video genre identification

Rouvier, Mickael; Oger, Stanislas; Linarès, Georges; Matrouf, Driss; Merialdo, Bernard; Li, Yingbo

IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 23, N°6, June 2015

This paper presents investigations about the automatic identification of video genre by audio channel analysis. Genre refers to editorial styles such commercials, movies, sports... We propose and evaluate some methods based on both low and high level descriptors, in cepstral or time domains, but also by analyzing the global structure of the document and the linguistic contents. Then, the proposed features are combined and their complementarity is evaluated. On a database composed of single-stories web-videos, the best audio-only based system performs 9% of Classification Error Rate (CER). Finally, we evaluate the complementarity of the proposed audio features and video features that are classically used for Video Genre Identification (VGI). Results demonstrate the complementarity of the modalities for genre recognition, the final audio-video system reaching 6% CER.

Doi Hal Bibtex

Titre:Audio-based video genre identification
Mots Clés:Automatic classification, linguistic feature extraction, video genre classification
Type:Journal
Langue:English
Ville:
Date:
Département:Data Science
Eurecom ref:4589
Copyright: © 2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Bibtex: @article{EURECOM+4589, doi = {http://dx.doi.org/10.1109/TASLP.2014.2387411}, year = {2015}, month = {01}, title = {{A}udio-based video genre identification}, author = {{R}ouvier, {M}ickael and {O}ger, {S}tanislas and {L}inar{\`e}s, {G}eorges and {M}atrouf, {D}riss and {M}erialdo, {B}ernard and {L}i, {Y}ingbo }, journal = {{IEEE}/{ACM} {T}ransactions on {A}udio, {S}peech, and {L}anguage {P}rocessing, {V}olume 23, {N}°6, {J}une 2015}, url = {http://www.eurecom.fr/publication/4589} }
Voir aussi: