Analysis of named entity recognition and linking for tweets

Derczynski, Leon; Maynard, Diana; Rizzo, Giuseppe; van Erp, Marieke; Aswani, Niraj; Troncy, Raphaël; Bontcheva, Kalina
Information Processing and Management, Volume 51, N°2, March 2015, Elsevier

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.


DOI
Type:
Journal
Date:
2015-03-01
Department:
Data Science
Eurecom Ref:
4250
Copyright:
© Elsevier. Personal use of this material is permitted. The definitive version of this paper was published in Information Processing and Management, Volume 51, N°2, March 2015, Elsevier and is available at : http://dx.doi.org/10.1016/j.ipm.2014.10.006

PERMALINK : https://www.eurecom.fr/publication/4250