Exploring intra-bow statistics for improving visual categorization

Niaz, Usman; Merialdo, Bernard
WIAMIS 2013, 14th International Workshop on Image and Audio Analysis for Multimedia Interactive Sercices, July 3-5, 2013, Paris, France

Research in video retrieval systems is mainly inspired by the state of the art text retrieval where high dimensional descriptors are quantized to visual words making a Bag Of Words
(BOW) histogram for an image. For a small BOW model potentially different descriptors could get assigned to the same visual word. Recently however refinements have been proposed to recover some of this representation loss for this simplistic model of visual description by studying the distribution of descriptors within the visual words [1, 2, 3]. Following the same foot-steps we enhance the BOW by encoding the position of each of the descriptor inside the quantized cell according to its centroid. Embedding this information to represent images increases precision of video concept detection. We compare our method to a BOW based baseline on TRECVID 2007 and TRECVID 2010 [4] datasets and show that adding the refinement proposed always improves the semantic indexing task. We also compare our method to that of [3] and show that it outperforms the Hamming Embedding Similarity based classification on the TRECVID 2007 dataset and illustrates comparable performance on the TRECVID 2010 set.

Data Science
Eurecom Ref:
© 2013 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PERMALINK : https://www.eurecom.fr/publication/4053