Semantic concept detection in large scale video collections is mostly achieved through a static analysis of selected keyframes. A popular choice for representing the visual content of an image is based on the pooling of local descriptors such as Dense SIFT. However, simple motion features such as optic flow can be extracted relatively easy from such keyframes. In this paper we propose an efficient addition to the DSIFT approach by including information derived from optic flow. Based on optic flow magnitude, we can estimate for each DSIFT patch whether it is static or moving. We modify the bag of words model used traditionally with DSIFT by creating two separate occurrence histograms instead of one: one for static patches and one for dynamic patches. We further refine this method by studying different separation thresholds and soft assignment, as well as different normalization techniques. Classifier score fusion is used to maximize the average precision of all these variants. Experimental results on the TRECVID Semantic Indexing collection show that by means of classifier fusion our method increases overall mean average precision of the DSIFT classifier from 0.061 to 0.106.
Introducing motion information in dense feature classifiers
WIAMIS 2013, 14th International Workshop on Image and Audio Analysis for Multimedia Interactive Services, 3-5 July, Paris, France
© 2013 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/4063