Multimedia Semantics and Interaction
Multimedia Semantics and Interaction aims at providing semantic models for multimedia metadata and user social activity on the web in order to support users complex information needs and interaction, such as exploring large information spaces, gathering heterogeneous and distributed information, or personalizing system behaviour. We massively use Linked Data technologies to perform these tasks.
Event-centric Interfaces for Interacting with Media
Large numbers of websites contain information about scheduled events, of which some may display media captured at these events. This information is, however, often incomplete and always locked into the sites. This prevents users from creating overviews of media associated with an event from multiple websites.
We assemble large collection of event and associated media descriptions, which we interlinked with the Linked Open Data cloud in order to form the so-called EventMedia dataset. We investigate how we could enrich automatically both the textual description and the visual illustration of a known event based on social media. Furthermore, we develop novel approaches for detecting unknown events at known location using social media activities.
Extracting Knowledge from Semi-Structured Text
The Web has become a large data space, where millions of semi-structured texts such as scientific, medical or news articles as well as forum and archived mailing list threads or (micro-)blog posts are available. These documents often contain rich semantics which is hidden to computing machinery. Natural Language Processing (NLP) and information extractors play a key role to extract information from unstructured or semi-structured text. Recently, Linked Data entity extractors have emerged for providing a URI that uniquely identifies named entities in the Web of Data in addition to their type.
We develop NERD, an evaluation framework that records and analyzes ratings of Named Entity (NE) extraction and disambiguation tools working on English plain text articles performed by human beings. NERD enables the comparison of different popular Linked Data entity extractors which expose APIs such as AlchemyAPI, DBPedia Spotlight, Extractiv, OpenCalais and Zemanta. Given an article and a particular tool, a user can assess the precision of the named entities extracted, their typing and linked data URI provided for disambiguation and their subjective relevance for the text. All user interactions are stored in a database.
Multimedia content is easy to produce but rather hard to find and to reuse on the Web. Digital photographs can be easily uploaded, communicated and shared in community portals such as Flickr, Picasa and Riya, while video are available on portals such as YouTube, DailyMotion, Metacafe or Vimeo to name a few. These systems allow their users to manually tag, comment and annotate the digital content, but they lack a general support for fine-grained semantic descriptions and look-up, especially when talking about things "inside" multimedia content, such as an object in a video or a person depicted in a still image. Furthermore, while video consumption on the web is continuously increasing, a large part of this content is not accessible to various categories of users. For example, blind and deaf users have little access to this enormous amount of content while digital technologies could, in theory, greatly improve the accessibility of rich media.
In this theme, we develop a standard means for uniquely identifying sub-parts of resources using URI fragment identifiers and we explore how the accessibility of web videos can be improved by providing rich descriptions of video content in order to personalize the rendering of the content according to user sensory deficiencies. The former has lead to a W3C Recommendation named Media Fragments URI while the later has been deployed on the Dailymotion video sharing web site.
Deploying Linked Data for Government
The Web is currently in a transition phase. After having been accessible on personal computers, it is now quickly moving to more and more ubiquity and entering in every part and moment of our lives. New devices and new ways to use them are being created. The ubiquity of the Web also creates an unseen abundance of information. Data is flowing onto the Web, created by users, generated by sensors, and stored in ever growing data farms. New ways to consume these data are still to be invented, allowing us to get only the essence of it. On-demand filtering and processing of this huge data flow is required to provide us with the information we need.
However, even if the raw data is there, even if the publishing and interlinking technology is there, the transition from raw published data to interlinked semantic data still needs to be done. Made of large raw data sources interlinked together, the Web of data takes advantage of semantic Web technologies in order to ensure interoperability and intelligibility of the data. We develop specific technologies for:
- publishing data as RDF graphs: a very simple data format,
- linking these data sets together, by identifying equivalent resources in other data sources,
- describing the vocabulary used in published data through ontologies
mostly for the Public Sector Information. We contribute to the W3C Government Linked Data Working Group that aims to provide standards and other information which help governments around the world publish their data as effective and usable Linked Data using Semantic Web technologies.