Graduate School and Research Center in Digital Sciences

Gated recurrent capsules for visual word embeddings

Francis, Danny; Huet, Benoit; Merialdo, Bernard

MMM 2019, 25th International Conference on MultiMedia Modeling, January 8-11, 2019, Thessaloniki, Greece

The caption retrieval task can be de ned as follows: given a set of images I and a set of describing sentences S, for each image i in I we ought to nd the sentence in S that best describes i. The most commonly applied method to solve this problem is to build a multimodal space and to map each image and each sentence in that space, so that they can be compared easily. A non-conventional model called Word2VisualVec has been proposed recently: instead of mapping images and sentences to a multimodal space, they mapped sentences directly to a space of visual features. Advances in the computation of visual features let us infer that such an approach is promising. In this paper, we propose a new model following that unconventional approach based on Gated Recurrent Capsules (GRCs), designed as an extention of Gated Recurrent Units (GRUs). We show that GRCs outperform GRUs on the caption retrieval task. We also state that GRCs presents a great potential for other applications.

Document Doi Bibtex

Title:Gated recurrent capsules for visual word embeddings
Keywords:multimodal embeddings, deep learning, capsule networks
Type:Conference
Language:English
City:Thessaloniki
Country:GREECE
Date:
Department:Data Science
Eurecom ref:5719
Copyright: © ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in MMM 2019, 25th International Conference on MultiMedia Modeling, January 8-11, 2019, Thessaloniki, Greece http://doi.org/10.1007/978-3-030-05716-9_23
Bibtex: @inproceedings{EURECOM+5719, doi = {http://doi.org/10.1007/978-3-030-05716-9_23}, year = {2019}, title = {{G}ated recurrent capsules for visual word embeddings}, author = {{F}rancis, {D}anny and {H}uet, {B}enoit and {M}erialdo, {B}ernard}, booktitle = {{MMM} 2019, 25th {I}nternational {C}onference on {M}ulti{M}edia {M}odeling, {J}anuary 8-11, 2019, {T}hessaloniki, {G}reece }, address = {{T}hessaloniki, {GREECE}}, month = {01}, url = {http://www.eurecom.fr/publication/5719} }
See also: