Novel methods for semantic and aesthetic multimedia retrieval

Redi, Miriam

Can we help computers to understand the meaning of an image, and to perceive the image beauty?
In the internet era, computerized classification and discovery of image properties is of crucial importance for the automatic organization of the huge volume of visual data surrounding us. Multimedia Information Retrieval (MMIR) is a research field that helps building intelligent systems that automatically recognize the image content and its characteristics.
In general, this is achieved by following a chain process: first, low-level features are extracted and pooled into compact image signatures. Then, machine learning techniques are used to build models able to distinguish between different image categories based on such signatures. Such model will be finally used to recognize the properties of a new image.
Despite the advances in the field, human vision systems still substantially outperform their computer-based counterparts.  In this thesis we therefore design a set of novel contributions for each step of the MMIR chain, aiming at improving the global recognition performances.
In our work, we explore techniques from a variety of fields that are not traditionally related with Multimedia Retrieval, and embed them into effective MMIR frameworks. For example, we borrow the concept of image saliency from visual perception, and use it to build low-level features. We employ the Copula theory of economic statistics for feature aggregation. We re-use the notion of graded relevance, popular in web page ranking, for visual retrieval frameworks. Finally, we explore the synergy between semantic analysis, photographic analysis, affective analysis and artistic analysis to study the aesthetics of images and videos.
We will explain in detail our novel solutions and prove their effectiveness for image categorization, video retrieval and image beauty assessment.

Data Science
Eurecom Ref:
© Université de Nice. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: