Let's make it dirty with BART!

Santoro, Donatello; Arocena, Patricia C; Glavic, Boris; Mecca, Giansalvatore; Miller, Renée J; Papotti, Paolo

SEBD 2018, 26th Italian Symposium on Advanced Database Systems, 24-27 June 2018, Castellaneta Marina, Taranto, Italy / Also published in CEUR Workshop Proceedings
Vol.2161/2018

In the last few years many automatic or semi-automatic data-repairing algorithms have been proposed in order to improve the quality of a given database. Due to the richness of research proposals, it is important to conduct experimental evaluations to assess each tool's potential. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. In this paper we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.

Detail

Document

BIBTEX

Type:

Conference

City:

Taranto

Date:

2018-06-24

Department:

Data Science

Eurecom Ref:

5651

CEUR