Graduate School and Research Center in Digital Sciences

Multi-level fusion for content-based semantic multimedia indexing and retrieval

Benmokhtar, Rachid


Neural Network based on Evidence Theory (NNET). This theory presents two importantPerplexity based Evidential Neural Network (PENN).Ontological PENN.Finally, we respond to the question concerning the usefulness of the low-level fusion. information for decision-making, compared to the probabilistic methods : belief degree and system ignorance. Then, NNET has been improved by incorporating the relationship between descriptors and concepts, modeled by a weight vector based on entropy and perplexity. The combination of this vector with the classi ers outputs, gives us a new model called We have also introduced the important topic of ontology and inter-concepts similarity (i.e. the study of relations between the classes). Indeed, the concepts are not generally expressed in isolation and a strong correlation exists between certain classes. The rst diculty lies in the use of an ontology that describes the relationships between concepts. The second concerns us more, is the operation of this semantic information. Three types of information are used : low-level visual descriptors, co-occurrence and semantic similarities, in conjunction with a multimedia knowledge database for semantic interpretation of video shots. The nal system is called   This was possible only through a statistical study of data before and after features fusion. The proposed systems have been validated on data from TRECVid (NoE K-Space project) and soccer videos provided by Orange-France Telecom Labs (CRE- Fusion project).   Today, the access to documents in databases, archives and Internet is mainly through textual data : image names or keywords. This search is not without faults : spelling, omission, etc. The recent advances in the eld of image analysis and machine learning could provide solutions such as features-based indexing and retrieval, using color, shape, texture, motion, audio and text. These features are rich in information, especially from the semantic point of view. This work deals with information retrieval and aims at semantic indexing of multimedia documents : video shots and key-frames. Indexing is an operation that consists of extracting, representing and organizing the content of documents in a database. However, indexation is confronted with the \semantic gap" problem between low-level visual representations and high-level features (concepts). To limit the consequences of this issue, we introduced into the system, di erent types of descriptors, while taking advantage of the scienti c advances in the eld of machine learning and the multi-level fusion. Indeed, fusion is used to combine several heterogeneous information from multiple sources, to obtain more complete, global and higher quality information. It can be applied to di erent levels of the classi cation process. Here, we studied the low-level feature fusion, high-level feature fusion and decision fusion. First, we present a state of the art of high-level fusion methods, in the indexing and search systems. In particular, the adaptation of evidence theory to neural network, thus giving

Document Bibtex

Title:Multi-level fusion for content-based semantic multimedia indexing and retrieval
Keywords:Video shots indexing, semantic gap, classification, feature fusion, classifier, fusion, inter-concepts similarity, ontology, LSCOM-lite, TRECVid.
Department:Multimedia Communications
Eurecom ref:2898
Copyright: © EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
Bibtex: @phdthesis{EURECOM+2898, year = {2009}, title = {{M}ulti-level fusion for content-based semantic multimedia indexing and retrieval }, author = {{B}enmokhtar, {R}achid}, school = {{T}hesis}, month = {06}, url = {} }
See also: