IRIM at TRECVID 2012: Semantic indexing and instance search

Ballas, Nicolas; Labbe, Benjamin; Shabou, Aymen; Le Borgne, Herve; Gosselin, Philippe; Redi, Miriam; Merialdo, Bernard; Jegou, Hervé; Delhumeau, Jonathan; Vieux, Rémi; Mansencal, Boris; Benois-Pineau, Jenny; Ayache, Stéphane; Haadi, Abdelkader; Safadi, Bahjat; Thollard, Franck; Derbas, Nadia; Quenot, Georges; Bredin, Hervé; Cord, Matthieu; Gao, Boyang; Zhu, Chao; Tang, Yuxing; Dellandreav, Emmanuel; Bichot, Charles-Edmond; Chen, Liming; Benoît, Alexandre; Lambert, Patrick; Strat, Tiberius; Razik, Joseph; Paris, Sebastion; Glotin, Hervé; Trung, Tran Ngo; Petrovska, Dijana; Chollet, Geerard; Stoian, Andrei; Crucianu, Michel
TRECVID 2012, TREC Video Retrieval Evaluation workshop, November 2012, Gaithersburg, MD, United States

The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 semantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization,
classi cation, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of different descriptors and tried di erent fusion strategies.
The best IRIM run has a Mean Inferred Average Precision of 0.2378, which ranked us 4th out of 16 participants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute similrity between an example image of instance and keyframes of a video clip. Then a two-step fusion method is used to combine these individual results and obtain a score for the likelihood of an instance to appear in a video clip. These scores are used to obtain a ranked list of clips the most likely to contain the queried instance. The best IRIM run has a MAP of 0.1192, which ranked us 29th on 79 fully automatic runs.

Data Science
Eurecom Ref:
© NIST. Personal use of this material is permitted. The definitive version of this paper was published in TRECVID 2012, TREC Video Retrieval Evaluation workshop, November 2012, Gaithersburg, MD, United States and is available at :