Bag-of-Visual-Words (BoW) feature has been demonstrated
e(R)ective and widely used in video concept detection due to
its discriminative ability by capturing the local information
in images. In the current approaches, all the words in the
visual vocabulary are treated equally for the detection of dif-
ferent concepts. This cannot highlight the concept-speci¯c
visual information, and thus limits the discriminative ability
of BoW feature. In this paper, we propose an approach to
boost the performance of video concept detection based on
BoW. This is achieved by assigning di(R)erent weights to the
visual words according to their informativeness for the de-
tection of di(R)erent concepts. Kernel alignment score (KAS)
is used to measure the discriminative ability of SVM kernels,
and the visual words are weighted as a kernel optimization
problem. We show that the SVMs based on weighted visual
words with our approach outperform the uniformly weight-
ing and TF-IDF weighting schemes, and the MAP for the 20
concepts from TRECVID 2009 high-level feature extraction
is signi¯cantly improved.
http://dx.doi.org/10.1145/1878137.1878150