Cleaning data with Llunatic

Geerts, Floris; Mecca, Giansalvatore; Papotti, Paolo; Santoro, Donatello
The VLDB Journal, 8 November 2019

Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set
of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes of constraints. In this paper we develop a general chase-based repairing framework, referred to as LLUNATIC, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values. The framework is based on an elegant formalization in terms of labeled instances and partially ordered preference labels. In this context, we revisit concepts such as upgrades, repairs and the chase. In LLUNATIC, various repairing strategies can be slotted in, without the need for changing the underlying implementation. Furthermore, LLUNATIC is the first data repairing system which is DBMS-based. We report experimental results that confirm its good scalability and show that various instantiations of the framework result in repairs of good quality.

DOI
HAL
Type:
Journal
Date:
2019-11-08
Department:
Data Science
Eurecom Ref:
6107
Copyright:
© Springer. Personal use of this material is permitted. The definitive version of this paper was published in The VLDB Journal, 8 November 2019 and is available at : https://doi.org/10.1007/s00778-019-00586-5

PERMALINK : https://www.eurecom.fr/publication/6107