Multimedia content is one of the most widely used and intuitive way for consuming information. There is no doubt that video content is making its way to the Web, with hundred thousands hours of content being uploaded every day to social platforms such as YouTube. Can this huge amount of video content and its associated metadata be represented in a way that it is easier to find it, to re-use it and to address it not only as a whole but also at different levels of granularity? Is the information contained in those video items sufficiently contextualized so that it can be effectively consumed by humans and machines?
In this thesis, we investigate how to best use some available standards such as the Media Fragments URI specification and advanced annotation techniques to turn multimedia content into a first class citizen of the Web. To make this integration effective, we advocate the use of semantic technologies as a way to enable machines to automatically perform these tasks. We present approaches relying on different information retrieval and knowledge representation strategies for filling the gap between the low-level visual features obtainable via traditional analysis techniques, and the meaningful high-level annotations ready to be consumed. Aligning those concepts to standard vocabularies in the domain make it possible for machines to interpret and reason over the information available in those documents in order to offer innovative operations for browsing, enriching and hyperlinking media fragments, and ultimately, improve the way multimedia information is consumed.
We argue that the lack of context that a multimedia document taken in isolation can provide, hinders a proper understanding of the story being reported. International news items are a good example of such phenomena. Therefore, there is a need of unveiling other story's aspects that, even not being explicitly present in the seed document, are crucial to fully capture the backstory. To deal with this problem, we propose an innovative conceptual model called the News Semantic Snapshot (NSS) that is designed to make explicit the wide context of a news event. Following a process called Named Entity Expansion, we query the Web to bring other viewpoints about what is happening around us, from the thousands of news articles and posts where we could potentially find those missing story details. We have also proposed an innovative Concentric-based approach that better spots those contextual entities by leveraging on the duality between the so-called Core, which contains representative entities that are frequently mentioned in the related documents, and the the ones that hold particular semantic relationships with the Core and shape up the Crust around it.