Graduate School and Research Center in Digital Sciences

Embedding images and sentences in a common space with a recurrent capsule network

Francis, Danny; Huet, Benoit; Merialdo, Bernard

CBMI 2018, International Conference on Content-Based Multimedia Indexing, 4-6 September 2018, La Rochelle, France

Associating texts and images is an easy and intuitive task for a human being, but it raises some issues if we want that task to be accomplished by a computer. Among these issues, there is the problem of finding a common representation for images and sentences. Based on recent research about capsule networks, we define a novel model to tackle that issue. This model is trained and compared to other recent models on the Flickr8k database on Image Retrieval and Image Annotation (or Sentence Retrieval) tasks. We propose a new recurrent architecture inspired from capsule networks to replace the traditional LSTM/GRU and show how it leads to improved performances. Moreover, we show that the interest of our model goes beyond its performances and includes its intrinsic characteristics, which can explain why it performs particularly well on the Image Annotation task. In addition, we propose a routing procedure between capsules which is fully learned during the training of our model.

Document Doi Bibtex

Title:Embedding images and sentences in a common space with a recurrent capsule network
Keywords:multimodal embeddings, deep learning, multimedia retrieval
Type:Conference
Language:English
City:La Rochelle
Country:FRANCE
Date:
Department:Data Science
Eurecom ref:5644
Copyright: © 2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Bibtex: @inproceedings{EURECOM+5644, doi = {http://dx.doi.org/10.1109/CBMI.2018.8516480}, year = {2018}, title = {{E}mbedding images and sentences in a common space with a recurrent capsule network}, author = {{F}rancis, {D}anny and {H}uet, {B}enoit and {M}erialdo, {B}ernard}, booktitle = {{CBMI} 2018, {I}nternational {C}onference on {C}ontent-{B}ased {M}ultimedia {I}ndexing, 4-6 {S}eptember 2018, {L}a {R}ochelle, {F}rance}, address = {{L}a {R}ochelle, {FRANCE}}, month = {09}, url = {http://www.eurecom.fr/publication/5644} }
See also: