Ecole d'ingénieur et centre de recherche en Sciences du numérique

A two-fold quality assurance approach for dynamic knowledge bases: The 3cixty use case

Mihindukulasooriya, Nandana; Rizzo, Giuseppe; Troncy, Raphaël; Corcho, Oscar; Garcia-Castro, Raul

ESWC 2016, International Workshop on Completing and Debugging the Semantic Web (CoDeS'16), May 30, 2016, Heraklion, Greece

The 3cixty platform relies on a continuous integration workflow for the generation and maintenance of evolving knowledge bases in the domain of culture and tourism. This approach is inspired by common practices in the software engineering industry in which continuous integration is widely-used for quality assurance purposes. The objective of this paper is to present a similar approach for knowledge base population and publishing. The proposed approach consists of two main steps: (i) exploratory testing, and (ii) fine-grained analysis. In the exploratory testing step, the knowledge base is tested for patterns that may reveal erroneous data or outliers that could indicate inaccuracies. This phase is knowledge-base agnostic and provides inputs for the second phase. In the finegrained analysis step, specific tests are developed for a particular knowledge base according to the data model and pre-defined constraints that shape the data. More precisely, a set of predefined queries are executed and their results are compared to the expected answers (similar to unit testing in software engineering) in order to automatically validate that the knowledge base fulfills a set of requirements. The main objective of this approach is to detect and to flag potential defects as early as possible in the data publishing process and to eliminate or minimize the undesirable outcomes in the applications that depend on the knowledge base, typically, user interfaces that enable to explore the data but rely on a particular shape of the data. This two-fold approach proves to be critical when the knowledge base is continuously evolving, not necessarily in a monotonic way, and when real-world applications highly depend on it such as the 3cixty multi-device application.

Document Bibtex

Titre:A two-fold quality assurance approach for dynamic knowledge bases: The 3cixty use case
Mots Clés:Data quality, data validation, continuous integration, dynamic knowledge base
Type:Conférence
Langue:English
Ville:Heraklion
Pays:GRÈCE
Date:
Département:Data Science
Eurecom ref:4903
Copyright: © Springer. Personal use of this material is permitted. The definitive version of this paper was published in ESWC 2016, International Workshop on Completing and Debugging the Semantic Web (CoDeS'16), May 30, 2016, Heraklion, Greece and is available at :
Bibtex: @inproceedings{EURECOM+4903, year = {2016}, title = {{A} two-fold quality assurance approach for dynamic knowledge bases: {T}he 3cixty use case}, author = {{M}ihindukulasooriya, {N}andana and {R}izzo, {G}iuseppe and {T}roncy, {R}apha{\"e}l and {C}orcho, {O}scar and {G}arcia-{C}astro, {R}aul}, booktitle = {{ESWC} 2016, {I}nternational {W}orkshop on {C}ompleting and {D}ebugging the {S}emantic {W}eb ({C}o{D}e{S}'16), {M}ay 30, 2016, {H}eraklion, {G}reece}, address = {{H}eraklion, {GR}{\`{E}}{CE}}, month = {05}, url = {http://www.eurecom.fr/publication/4903} }
Voir aussi: