Visual material comprising images and videos is growing ever so rapidly over the internet and in our personal collections. This necessitates automatic understanding of the visual content which calls for the conception of intelligent methods to correctly index, search and retrieve images and videos. This thesis aims at improving the automatic detection of concepts in the internet videos by exploring all the available information and putting the most beneficial out of it to good use. Our contributions address various levels of the concept detection framework and can be divided into three main parts. The first part improves the Bag of Words (BOW) video representation model by proposing a novel BOW construction mechanism using concept labels and by including a refinement to the BOW signature based on the distribution of its elements. We then devise methods to incorporate knowledge from similar and dissimilar entities to build improved recognition models in the second part. Here we look at the potential information that the concepts share and build models for meta-concepts from which concept specific results are derived. This improves recognition for concepts lacking labeled examples. Lastly we contrive certain semi-supervised learning methods to get the best of the substantial amount of unlabeled data. We propose techniques to improve the semi-supervised cotraining algorithm with optimal view selection.
Cutting the visual world into bigger slices for improved video concept detection
© TELECOM ParisTech. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
PERMALINK : https://www.eurecom.fr/publication/4335