Mining expressive rules in knowledge graphs

Ahmadi, Naser; Huynh, Viet-Phi; Meduri, Vamsi; Ortona, Stefano; Papotti, Paolo
ACM Journal of Data and Information Quality (JDIQ), Vol. 1, N°1, Article 8, May 2020

We describe RuDiK, an algorithm and a system for mining declarative rules over RDF knowledge graphs (KGs). RuDiK can discover rules expressing both positive relationships between KG elements, e.g., “if two persons share at least one parent, they are likely to be siblings”, and negative patterns identifying data contradictions, e.g., “if two persons are married, one cannot be the child of the other" or “the birth year for a person cannot be bigger than her graduation year". While the first kind of rules identify new facts in the KG, the second kind enables the detection of incorrect triples and the generation of (training) negative examples for learning algorithms. High quality rules are also critical for any reasoning task involving the KGs. Our approach increases the expressive power of the supported rule language w.r.t. the existing systems. RuDiK discovers rules containing (i) comparisons among literal values and (ii) selection conditions with constants. Richer rules increase the accuracy and the coverage over the facts in the KG for the task at hand. This is achieved with aggressive pruning of the search space and with disk-based algorithms, which enable the execution of the system in commodity machines. Also, RuDiK is robust to errors and missing data in the input graph. It discovers approximate rules with a measure of support that is aware of the quality issues. Our experimental evaluation with real-world KGs shows that RuDiK does better than existing solutions in terms of scalability and that it can identify effective rules for different target applications.


DOI
HAL
Type:
Journal
Date:
2020-05-01
Department:
Data Science
Eurecom Ref:
6156
Copyright:
© ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Journal of Data and Information Quality (JDIQ), Vol. 1, N°1, Article 8, May 2020 https://doi.org/10.1145/3371315

PERMALINK : https://www.eurecom.fr/publication/6156