Graduate School and Research Center in Digital Sciences

Fusion of multimodal embeddings for ad-hoc video search

Francis, Danny; Nguyen, Phuong Anh; Huet, Benoit; Ngo, Chong-Wah

ViRaL 2019, 1st International Workshop on Video Retrieval Methods and Their Limits, co-located with ICCV 2019, 28 October 2019, Seoul, Korea

The challenge of Ad-Hoc Video Search (AVS) originates from free-form (i.e., no pre-defined vocabulary) and freestyle (i.e., natural language) query description. Bridging the semantic gap between AVS queries and videos becomes highly difficult as evidenced from the low retrieval accuracy of AVS benchmarking in TRECVID. In this paper, we study a new method to fuse multimodal embeddings which have been derived based on completely disjoint datasets. This method is tested on two datasets for two distinct tasks: on MSR-VTT for unique video retrieval and on V3C1 for multiple videos retrieval.

Document Bibtex

Title:Fusion of multimodal embeddings for ad-hoc video search
Type:Conference
Language:English
City:Seoul
Country:KOREA, REPUBLIC OF
Date:
Department:Data Science
Eurecom ref:6052
Copyright: © NIST. Personal use of this material is permitted. The definitive version of this paper was published in ViRaL 2019, 1st International Workshop on Video Retrieval Methods and Their Limits, co-located with ICCV 2019, 28 October 2019, Seoul, Korea and is available at :
Bibtex: @inproceedings{EURECOM+6052, year = {2019}, title = {{F}usion of multimodal embeddings for ad-hoc video search}, author = {{F}rancis, {D}anny and {N}guyen, {P}huong {A}nh and {H}uet, {B}enoit and {N}go, {C}hong-{W}ah}, booktitle = {{V}i{R}a{L} 2019, 1st {I}nternational {W}orkshop on {V}ideo {R}etrieval {M}ethods and {T}heir {L}imits, co-located with {ICCV} 2019, 28 {O}ctober 2019, {S}eoul, {K}orea}, address = {{S}eoul, {KOREA}, {REPUBLIC} {OF}}, month = {10}, url = {http://www.eurecom.fr/publication/6052} }
See also: