Graduate School and Research Center in Digital Sciences

NERD: evaluating named entity recognition tools in the web of data

Rizzo, Giuseppe; Troncy, Raphaël

ISWC 2011, Workshop on Web Scale Knowledge Extraction (WEKEX'11), October 23-27, 2011, Bonn, Germany

The Web of data promotes the idea that more and more data are interconnected. A step towards this goal is to bring more structured annotations to existing documents using common vocabularies or ontologies. Semi-structured texts such as scientific, medical or news articles as well as forum and archived mailing list threads or (micro-)blog posts can hence be semantically annotated. Named Entity (NE) extractors play a key role for extracting structured information by identifying features, also called entities, and by linking them to other web resources by means of typed inferences. In this article, we propose a thorough evaluation of five popular Linked Data entity extractors which expose APIs: AlchemyAPI, DBPedia Spotlight, Extractiv, OpenCalais and Zemanta. We present NERD, an evaluation framework we have developed and the results of a controlled evaluation performed by human beings that consists in assigning a Boolean value to three criteria: entity detection, entity type and entity disambiguation.

Document Bibtex

Title:NERD: evaluating named entity recognition tools in the web of data
Keywords:Entity extraction, Linked Data, Natural Language Processing, Evaluation of Linked Data entity extraction tools
Type:Conference
Language:English
City:Bonn
Country:GERMANY
Date:
Department:Data Science
Eurecom ref:3517
Copyright: © Springer. Personal use of this material is permitted. The definitive version of this paper was published in ISWC 2011, Workshop on Web Scale Knowledge Extraction (WEKEX'11), October 23-27, 2011, Bonn, Germany and is available at :
Bibtex: @inproceedings{EURECOM+3517, year = {2011}, title = {{NERD}: evaluating named entity recognition tools in the web of data}, author = {{R}izzo, {G}iuseppe and {T}roncy, {R}apha{\"e}l}, booktitle = {{ISWC} 2011, {W}orkshop on {W}eb {S}cale {K}nowledge {E}xtraction ({WEKEX}'11), {O}ctober 23-27, 2011, {B}onn, {G}ermany}, address = {{B}onn, {GERMANY}}, month = {10}, url = {http://www.eurecom.fr/publication/3517} }
See also: