Ecole d'ingénieur et centre de recherche en Sciences du numérique

Visual versus textual embedding for video retrieval

Francis, Danny; Pidou, Paul; Merialdo, Bernard; Huet, Benoit

ACIVS 2017, Advanced Concepts for Intelligent Vision Systems, September 18-21, 2017, Antwerp, Belgium

This paper compares several approaches of natural language access to video databases. We present two main strategies. The first one is visual, and consists in comparing keyframes with images retrieved from Google Images. The second one is textual and consists in generating a text-based description of the keyframes, and comparing these descriptions with the query. We study the effect of several parameters and find out that substantial improvement is possible by choosing the right strategy for a given topic. Finally we investigate a method for choosing the right approach for a given topic.

Document Bibtex

Titre:Visual versus textual embedding for video retrieval
Type:Conférence
Langue:English
Ville:Antwerp
Pays:BELGIQUE
Date:
Département:Data Science
Eurecom ref:5319
Copyright: © Springer. Personal use of this material is permitted. The definitive version of this paper was published in ACIVS 2017, Advanced Concepts for Intelligent Vision Systems, September 18-21, 2017, Antwerp, Belgium and is available at :
Bibtex: @inproceedings{EURECOM+5319, year = {2017}, title = {{V}isual versus textual embedding for video retrieval}, author = {{F}rancis, {D}anny and {P}idou, {P}aul and {M}erialdo, {B}ernard and {H}uet, {B}enoit }, booktitle = {{ACIVS} 2017, {A}dvanced {C}oncepts for {I}ntelligent {V}ision {S}ystems, {S}eptember 18-21, 2017, {A}ntwerp, {B}elgium}, address = {{A}ntwerp, {BELGIQUE}}, month = {09}, url = {http://www.eurecom.fr/publication/5319} }
Voir aussi: