Multimodal and multilingual understanding of smells using VilBERT and mUNITER

Akdemir, Kiymet; Hürriyetoglu, Ali; Troncy, Raphaël; Paccosi, Teresa; Menini, Stefano; Zinnen, Mathias; Christlein, Vincent

MediaEval 2022, Multimedia Evaluation Workshop, 12-13 January 2023, Bergen, Norway (Hybrid Event)

We evaluate state-of-the-art multimodal models to detect common olfactory references in multilingual text and images in the scope of the Multimodal Understanding of Smells in Texts and Images (MUSTI) at Mediaeval’22. The goal of the MUSTI Subtask 1 is to classify paired text and images as to whether they refer to the same smell source or not. We approach this task as a Visual Entailment problem and evaluate the performance of the English model ViLBERT and the multilingual model mUNITER on MUSTI Subtask 1. Although base VilBERT and mUNITER models perform worse than a dummy baseline, fine-tuning these models improve performance significantly in almost all scenarios. We find that fine-tuning mUNITER with SNLI-VE and MUSTI train data performs better than other configurations we implemented. Our experiments demonstrate that the task presents some challenges, but it is by no means impossible. Our code is available on https://github.com/Odeuropa/musti-eval-baselines.

Detail

Document

BIBTEX

Type:

Conférence

City:

Bergen

Date:

2023-01-12

Department:

Data Science

Eurecom Ref:

7181

CEUR