Graduate School and Research Center in Digital Sciences

Let's make it dirty with BART!

Santoro, Donatello; Arocena, Patricia C; Glavic, Boris; Mecca, Giansalvatore; Miller, Renée J; Papotti, Paolo

SEBD 2018, 26th Italian Symposium on Advanced Database Systems, 24-27 June 2018, Castellaneta Marina, Taranto, Italy / Also published in CEUR Workshop Proceedings Vol.2161/2018

In the last few years many automatic or semi-automatic data-repairing algorithms have been proposed in order to improve the quality of a given database. Due to the richness of research proposals, it is important to conduct experimental evaluations to assess each tool's potential. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. In this paper we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.   

Document Bibtex

Title:Let's make it dirty with BART!
Department:Data Science
Eurecom ref:5651
Copyright: CEUR
Bibtex: @inproceedings{EURECOM+5651, year = {2018}, title = {{L}et's make it dirty with {BART}!}, author = {{S}antoro, {D}onatello and {A}rocena, {P}atricia {C} and {G}lavic, {B}oris and {M}ecca, {G}iansalvatore and {M}iller, {R}en{\'e}e {J} and {P}apotti, {P}aolo}, booktitle = {{SEBD} 2018, 26th {I}talian {S}ymposium on {A}dvanced {D}atabase {S}ystems, 24-27 {J}une 2018, {C}astellaneta {M}arina, {T}aranto, {I}taly / {A}lso published in {CEUR} {W}orkshop {P}roceedings {V}ol.2161/2018}, address = {{T}aranto, {ITALY}}, month = {06}, url = {} }
See also: