Combining textual and visual modeling for predicting media memorability

Reboud, Alison; Harrando, Ismail; Laaksonen, Jorma; Francis, Danny; Troncy, Raphaël; Mantecon, Hector Laria

MediaEval 2019, 10th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop, 27-29 October 2019, Sophia Antipolis, France

This paper describes a multimodal approach proposed by the MeMAD team for the MediaEval 2019 “Predicting Media memorability” task. Our best approach is a weighted average method combining predictions made separately from visual and textual representations of videos. In particular, we augmented the provided textual descriptions with automatically generated deep captions. For long term memorability, we obtained better scores using the short term predictions rather than the long term ones. Our best model achieves Spearman scores of 0.522 and 0.277 respectively for the short and long term predictions tasks.

Detail

Document

BIBTEX

Type:

Conférence

City:

Sophia Antipolis

Date:

2019-10-27

Department:

Data Science

Eurecom Ref:

6062

CEUR