Representation, information extraction, and summarization for automatic multimedia understanding

Harrando, Ismail
Thesis

Whether on TV or on the internet, video content production is seeing an unprecedented rise. With every big tech and media company putting a horse on the race of video sharing and streaming services, not only is video the dominant medium for entertainment purposes, but it is also reckoned to be the future of media consumption on the web for education, information and leisure.

Nevertheless, the traditional paradigm for multimedia management   proves to be incapable of keeping pace with the scale brought about by the sheer volume of content created every day across the disparate distribution channels. Thus, routine tasks like archiving, editing, content organization and retrieval by multimedia creators become prohibitively costly or reduced to an affordable minimum. On the user side,  too, the amount of multimedia content pumped   daily can be simply overwhelming; the need for shorter and more personalized content has never been more pronounced. Recommending, enriching and summarizing content can help to capitalize on users’ engagement and generate their interactions.

To advance the state of the art on both fronts, a certain level of "multimedia understanding" has to be achieved by our computers. In this research thesis, we aim to go about the multiple challenges facing automatic media content processing and analysis, mainly gearing our exploration to three axes:

1. Representing multimedia: With all its richness and variety, modeling and representing multimedia content can be a challenge in itself. We explore the potential of two such representations: as a knowledge graph, allowing advanced and consistent querying possibilities across the available corpora, as well as embeddings, both semantic and textual, to serve as a basis for a content-based recommender system.

2. Describing multimedia: The textual component of multimedia (that can be automatically extracted from speech data) can be capitalized on to generate high-level descriptors, or annotations, for the content at hand. This can help both end-users and practitioners navigate, organize, and explore the content for several applications.

3. Summarizing multimedia: Multimodal content can be long, dense and complex. We thus investigate the possibility of extracting highlights from media content, both for narrative-focused summarization and for maximizing memorability.


HAL
Type:
Thesis
Date:
2022-05-13
Department:
Data Science
Eurecom Ref:
6897
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/6897