Roomba: An extensible framework to validate and build dataset profi les

Assaf, Ahmad; Troncy, Raphaël; Senart, Aline

PROFILES 2015, 2nd International Workshop on Dataset Profiling & Federated Search for Linked Data, Main conference ESWC15, 31 May-4 June 2015, Portoroz, Slovenia

Best Paper Award

Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to bene t from this mine of data, one needs to access to descriptive information about each dataset (or metadata). This information can be used to delay data entropy, enhance datasets discovery, exploration and reuse as well as helping data portal administrators in detecting and eliminating spam. However, such metadata information is currently very limited to a few data portals where they are usually provided manually, thus being often incomplete and inconsistent in terms of quality. To address these issues, we propose a scalable automatic approach for extracting, validating, correcting and generating descriptive linked dataset pro les. This approach applies several techniques in order to check the validity of the metadata provided and to generate descriptive and statistical information for a particular dataset or for an entire data portal.

Titre:Roomba: An extensible framework to validate and build dataset profi les
Mots Clés:Linked Data, Dataset Pro le, Metadata, Data Quality
Département:Data Science
Eurecom ref:4542
Copyright: CEUR
