Journal of Data and Information Quality, Special issue " Metadata Discovery for Assessing Data Quality", July 2020
Entity-centric knowledge graphs (KGs) are now popular to collect facts about entities. KGs have rich schemas, with a large number of different types and predicates to describe the entities and their relationships. On these rich schemas, logical rules are used to represent dependencies between the data elements. While rules are useful in query answering, data curation, and other tasks, they usually do not come with the KGs. Such rules have to be manually defined or discovered with the help of rule mining methods.We believe this rule-collection task should be done collectively to better capitalize our understanding of the data and to avoid redundant work conducted on the same KGs. For this reason, we introduce RuleHub, our extensible corpus of rules for public KGs. RuleHub provides functionalities for the archival and the retrieval of rules to all users, with an extensible architecture that does not constrain the KG or the type of rules supported. We are populating the corpus with thousands of rules from the most popular KGs and report on our experiments on automatically characterizing the quality of a rule with statistical measures.
© ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Journal of Data and Information Quality, Special issue " Metadata Discovery for Assessing Data Quality", July 2020 https://doi.org/10.1145/3409384