Semantic enrichment for recommendation of primary studies in a systematic literature review

Rizzo, Giuseppe; Tomassetti Federico; Vetrò, Antonio; Ardito, Luca; Torchiano, Marco; Morisio Maurizio; Troncy, Raphaël
Digital Scholarship in the Humanities, Oxford University Press, 13 August 2015

A Systematic Literature Review (SLR) identifies, evaluates, and synthesizes the literature available for a given topic. This generally requires a significant human workload and has subjectivity bias that could affect the results of such a review. Automated document classification can be a valuable tool for recommending the selection of studies. In this article, we propose an automated pre-selection approach based on text mining and semantic enrichment techniques. Each document is firstly processed by a named entity extractor. The DBpedia URIs coming from the entity linking process are used as external sources of information. Our system collects the bag of words of those sources and it adds them to the initial document. A Multinomial Naive Bayes classifier discriminates whether the enriched document belongs to the positive example set or not. We used an existing manually performed SLR as benchmark data set. We trained our system with different configurations of relevant documents and we tested the goodness of our approach with an empirical assessment. Results show a reduction of the manual workload of 18% that a human researcher has to spend, while holding a remarkable 95% of recall, important condition for the nature itself of SLRs. We measure the effect of the enrichment process to the precision of the classifier and we observed a gain up to 5%.

Data Science
Eurecom Ref:
Oxford University Press