A comparative study of n-gram and skip-gram for clinical concepts extraction

Sabra, Susan; Sabeeh, Vian
CSCI 2020, International Conference on Computational Science and Computational Intelligence, 16-18 December 2020, Las Vegas, NV, USA

State-of-the-art technologies for clinical knowledge extraction are essential in a clinical decision support system (CDSS) to make a prediction of a diagnosis. Automatic analysis of a patient’s health data is a requirement in such a process. The unstructured part of the data in electronic health records (EHR) is critical, as it may contain hidden risk factors. We present in this paper a comparative study of two well-known techniques N-gram and Skip-gram to enhance the extraction of risk factors concepts from the clinical narratives after applying initial natural language processing (NLP) techniques. We evaluate the use of both techniques using a case study dataset of patients’ records with venous thromboembolism (VTE). Results of the techniques’ comparative study yielded an advancement of N-gram precision while Skip-gram produced a better performance in terms of the recall measure.


DOI
Type:
Conference
City:
Las Vegas
Date:
2020-12-16
Department:
Data Science
Eurecom Ref:
6593
Copyright:
© 2020 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
See also:

PERMALINK : https://www.eurecom.fr/publication/6593