A comparative study of n-gram and skip-gram for clinical concepts extraction

Sabra, Susan; Sabeeh, Vian
CSCI 2020, International Conference on Computational Science and Computational Intelligence, 16-18 December 2020, Las Vegas, NV, USA

State-of-the-art technologies for clinical knowledge extraction are essential in a clinical decision support system (CDSS) to make a prediction of a diagnosis. Automatic analysis of a patient’s health data is a requirement in such a process. The unstructured part of the data in electronic health records (EHR) is critical, as it may contain hidden risk factors. We present in this paper a comparative study of two well-known techniques N-gram and Skip-gram to enhance the extraction of risk factors concepts from the clinical narratives after applying initial natural language processing (NLP) techniques. We evaluate the use of both techniques using a case study dataset of patients’ records with venous thromboembolism (VTE). Results of the techniques’ comparative study yielded an advancement of N-gram precision while Skip-gram produced a better performance in terms of the recall measure.

