Deep features for multimodal emotion classification

Tiwari, Shriman Narayan; Duong, Ngoc Q. K.; Lefebvre, Frédéric; Demarty, Claire-Hélène; Huet, Benoit; Chevallier, Louis

on HAL

Understanding human emotion when perceiving audiovisual content is an exciting and important research avenue. Thus, there have been emerging attempts to predict the emotion elicited by video clips or movies recently. While most existing approaches focus either on single modality, i. e., only audio or visual data is exploited, or build on a multimodal scheme with late fusion , we propose a multimodal framework with early fusion scheme and target an emotion classification task. Our proposed mechanism presents the advantages of handling (1) the variation in video length, (2) the imbalance of audio and visual feature sizes, and (3) the middle-level fusion of audio and visual information such that a higher level feature representation can be learned jointly from the two modalities for classification. We evaluate the performance of the proposed approach on the international benchmark, i. e., the MediaEval 2015 Affective Impact of Movies 1 task , and show that it outperforms most state-of-the-art systems on arousal accuracy while using a much smaller feature size.

Detail

Document

HAL

BIBTEX

Type:

Conference

Date:

2016-06-20

Department:

Data Science

Eurecom Ref:

4928

PERMALINK : https://www.eurecom.fr/publication/4928