Graduate School and Research Center in Digital Sciences

Marginal-based visual alphabets for local image descriptors aggregation

Redi, Miriam; Mérialdo, Bernard

MM 2011, 19th ACM International Conference on Multimedia, 28 November-1 December, 2011, Scottsdale, USA

Bag of Words (BOW) models are nowadays one of the most effective methods for visual categorization. They use visual dictionaries to aggregate the set of local descriptors extracted from a given image. Despite their high discriminative ability, one of the major drawbacks of BOW still remains the computational cost of the visual dictionary, built by clustering in the high dimensional feature space. In this paper we introduce a fast, effective method for local image descriptors aggregation that is based on marginal approximations, i.e. the approximation of each descriptor component distribution. We quantize each dimension of the feature space, obtaining a visual alphabet that we use to map the image descriptors in a fixed-length visual signature. Experimental results show that our new method outperforms the traditional BOW model in both accuracy and efficiency for the scene recognition task. Moreover, we discover that the marginal-based aggregation provides complementary information with respect to BOW, by combining the two models in a video retrieval system based on TRECVID 2010 [9].

Document Doi Bibtex

Title:Marginal-based visual alphabets for local image descriptors aggregation
Keywords:Scene Recognition, Feature Extraction, CBIR
Type:Conference
Language:English
City:Scottsdale
Country:UNITED STATES
Date:
Department:Data Science
Eurecom ref:3498
Copyright: © ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in MM 2011, 19th ACM International Conference on Multimedia, 28 November-1 December, 2011, Scottsdale, USA http://dx.doi.org/10.1145/2072298.2072032
Bibtex: @inproceedings{EURECOM+3498, doi = {http://dx.doi.org/10.1145/2072298.2072032}, year = {2011}, title = {{M}arginal-based visual alphabets for local image descriptors aggregation}, author = {{R}edi, {M}iriam and {M}{\'e}rialdo, {B}ernard}, booktitle = {{MM} 2011, 19th {ACM} {I}nternational {C}onference on {M}ultimedia, 28 {N}ovember-1 {D}ecember, 2011, {S}cottsdale, {USA}}, address = {{S}cottsdale, {UNITED} {STATES}}, month = {11}, url = {http://www.eurecom.fr/publication/3498} }
See also: