This paper describes a multimodal approach proposed by the MeMAD team for the MediaEval 2020 “Predicting Media Memorability” task. Our best approach is a weighted average method combining predictions made separately from visual, audio, textual and visiolinguistic representations of videos. Our best model achieves Spearman scores of 0.101 and 0.078, respectively, for the short and long term predictions tasks.
Predicting media memorability with audio, video, and text representations
MEDIAEVAL 2020, Multimedia Evaluation Benchmark, 14-15 December 2020, Virtual Event
PERMALINK : https://www.eurecom.fr/publication/6438