PicSOM and EURECOM experiments in TRECVID 2019

Mantecon, Hector Laria; Laaksonen, Jorma; Francis, Danny; Huet, Benoit
TRECVID 2019, 23rd International Workshop on Video Retrieval Evaluation, 12-13 November 2019, Gaithersburg, MD, USA

This year, the PicSOM and EURECOM teams participated only in the Video to Text Description (VTT), Description Generation subtask. Both groups submitted one or two runs labeled as a ”MeMAD” submission, stemming from a joint EU H2020 research project with that name. In total, the PicSOM team submitted four runs and EURECOM one run. The goal of the PicSOM submissions was to study the effect of using either image or video features or both. The goal of the EURECOM submission was to experiment with the use of Curriculum Learning in video captioning. The submitted five runs are as follows:

• PICSOM.1-MEMAD.PRIMARY: uses ResNet and I3D features for initialising the LSTM generator, and is trained on MS COCO + TGIF using self-critical loss,

• PICSOM.2-MEMAD: uses I3D features as initialisation, and is trained on TGIF using self-critical loss,

• PICSOM.3: uses ResNet features as initialisation, and is trained on MS COCO + TGIF using self-critical loss,

• PICSOM.4: is the same as PICSOM.1-MEMAD.PRIMARY except that the loss function used is cross-entropy,

• EURECOM.MEMAD.PRIMARY: uses I3D features to initialize a GRU generator, and is trained on TGIF + MSR-VTT + MSVD with cross-entropy and curriculum learning. The runs aim at comparing the use of cross-entropy and self-critical training loss functions and to showing whether one can successfully use both still image and video features even when the COCO dataset does not allow the extractions of I3D video features. Based on the results of the runs, it seems that using both video and still image features, one can obtain better captioning results than with either one of the single modalities alone. The Curriculum Learning process proposed does not seem to be beneficial.


HAL
Type:
Conference
City:
Gaithersburg
Date:
2019-11-12
Department:
Data Science
Eurecom Ref:
6109
Copyright:
© NIST. Personal use of this material is permitted. The definitive version of this paper was published in TRECVID 2019, 23rd International Workshop on Video Retrieval Evaluation, 12-13 November 2019, Gaithersburg, MD, USA and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/6109