Multi-level fusion for content-based semantic multimedia indexing and retrieval

Benmokhtar, Rachid

Neural Network based on Evidence Theory (NNET). This theory presents two importantPerplexity based Evidential Neural Network (PENN).Ontological PENN.Finally, we respond to the question concerning the usefulness of the low-level fusion.

information for decision-making, compared to the probabilistic methods : belief degree

and system ignorance. Then, NNET has been improved by incorporating the relationship

between descriptors and concepts, modeled by a weight vector based on entropy and perplexity.

The combination of this vector with the classi ers outputs, gives us a new model


We have also introduced the important topic of ontology and inter-concepts similarity

(i.e. the study of relations between the classes). Indeed, the concepts are not generally

expressed in isolation and a strong correlation exists between certain classes. The rst

diculty lies in the use of an ontology that describes the relationships between concepts.

The second concerns us more, is the operation of this semantic information. Three types of

information are used : low-level visual descriptors, co-occurrence and semantic similarities,

in conjunction with a multimedia knowledge database for semantic interpretation of video

shots. The nal system is called


This was possible only through a statistical study of data before and after features fusion.

The proposed systems have been validated on data from TRECVid (NoE K-Space project)

and soccer videos provided by Orange-France Telecom Labs (CRE- Fusion project).


Today, the access to documents in databases, archives and Internet is mainly through

textual data : image names or keywords. This search is not without faults : spelling, omission,

etc. The recent advances in the eld of image analysis and machine learning could provide

solutions such as features-based indexing and retrieval, using color, shape, texture, motion,

audio and text. These features are rich in information, especially from the semantic point

of view.

This work deals with information retrieval and aims at semantic indexing of multimedia

documents : video shots and key-frames. Indexing is an operation that consists of extracting,

representing and organizing the content of documents in a database.

However, indexation is confronted with the \semantic gap" problem between low-level

visual representations and high-level features (concepts). To limit the consequences of this

issue, we introduced into the system, di
erent types of descriptors, while taking advantage

of the scienti c advances in the eld of machine learning and the multi-level fusion. Indeed,

fusion is used to combine several heterogeneous information from multiple sources, to obtain

more complete, global and higher quality information. It can be applied to di
erent levels

of the classi cation process. Here, we studied the low-level feature fusion, high-level feature

fusion and decision fusion.

First, we present a state of the art of high-level fusion methods, in the indexing and

search systems. In particular, the adaptation of evidence theory to neural network, thus


Data Science
Eurecom Ref:
© TELECOM ParisTech. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: