Deep multimodal features for movie genre and interestingness prediction

Ben Ahmed, Olfa; Huet, Benoit
CBMI 2018, International Conference on Content-Based
Multimedia Indexing, 4-6 September 2018, La Rochelle, France

Best Paper Award

In this paper, we propose a multimodal framework for video segment interestingness prediction based on the genre and affective impact of movie content. We hypothesize that the emotional characteristic and impact of a video infer its genre, which can in turn be a factor for identifying the perceived interestingness of a particular video segment (shot) within the entire media. Our proposed approach is based on audio-visual deep features for perceptual content analysis. The multimodal content is quantified in a mid-level representation which consists in describing each audio-visual segment as a distribution over various genres (action, drama, horror, romance, sci-fi for now). Some segment might be more characteristic of a media and therefore be more interesting than a segment containing content with a neutral genre. Having determined the genre of individual video segments, we trained a classifier to produce an interestingness factor which is then used to rank segments. We evaluate our approach on the MediaEval2017 Media Interestingness Prediction Task Dataset (PMIT). We demonstrate that our approach outperforms the existing video interestingness approaches on the PMIT dataset in terms of Mean Average Precision.


DOI
Type:
Conférence
City:
La Rochelle
Date:
2018-09-04
Department:
Data Science
Eurecom Ref:
5657
Copyright:
© 2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
See also:

PERMALINK : https://www.eurecom.fr/publication/5657