Embedding Images and Sentences in a Common Space with a Deep Recurrent Capsule Network

Danny FRANCIS - PhD Student Data Science

Date: -
Location: Eurecom

Abstract - Associating texts and images is an easy and intuitive task for a human being, but it raises some issues if we want that task to be accomplished by a computer. Among these issues, there is the problem of finding a common representation for images and sentences. Based on recent research about capsule networks, we define a novel model to tackle that issue. This model is trained and compared to other recent models on the Flickr8k database. We propose a new recurrent architecture inspired from capsule networks to replace the traditional LSTM/GRU and show how it leads to improved performances. We also give a routing procedure between capsules which is fully learned during the training of our model. Bio - Danny Francis is currently a PhD student in the Data Science department of EURECOM, under the supervision of Professor Bernard Merialdo and Professor Benoit Huet. He graduated as an engineer from Télécom ParisTech and EURECOM in 2016. His PhD work deals with analyzing and structuring automatically multimedia data. It is part of ANR’s GaFes project