Semantic concept detection using dense codeword motion

Tanase, Claudiu; Mérialdo, Bernard
ACIVS 2013, 15th International Conference, October 28-31, 2013, Poznan, Poland / Also published in LNCS, Volume 8192/2013

When detecting semantic concepts in video, much of the existing research in content-based classification uses keyframe information only. Particularly the combination between local features such as SIFT and the Bag of Words model is very popular with TRECVID participants. The few existing motion and spatiotemporal descriptors are computationally heavy and become impractical when applied on large datasets such as TRECVID. In this paper, we propose a way to efficiently combine positional motion obtained from optic flow in the keyframe with information given by the Dense SIFT Bag of Words feature. The features we propose work by spatially binning motion vectors belonging to the same codeword into separate histograms describing movement direction (left, right, vertical, zero, etc.). Classifiers are mapped using the homogeneous kernel map techinque for approximating the χ2 kernel and then trained efficiently using linear SVM. By using a simple linear fusion technique we can improve the Mean Average Precision of the Bag of Words DSIFT classifier on the TRECVID 2010 Semantic Indexing benchmark from 0.0924 to 0.0972, which is confirmed to be a statistically significant increase based on standardized TRECVID randomization tests.


DOI
Type:
Conference
City:
Poznan
Date:
2013-10-28
Department:
Data Science
Eurecom Ref:
4185
Copyright:
© Springer. Personal use of this material is permitted. The definitive version of this paper was published in ACIVS 2013, 15th International Conference, October 28-31, 2013, Poznan, Poland / Also published in LNCS, Volume 8192/2013 and is available at : http://dx.doi.org/10.1007/978-3-319-02895-8_63

PERMALINK : https://www.eurecom.fr/publication/4185