K-CAP 2021, 11th ACM Knowledge Capture Conference, 2-3 December 2021, New-York,USA (Virtual Event)
Traditional topic modeling approaches generally rely on documentterm co-occurrence statistics to find latent topics in a collection of documents. However, relying only on such statistics can yield incoherent or hard to interpret results for the end-users in many applications where the interest lies in interpreting the resulting topics (e.g. labeling documents, comparing corpora, guiding content exploration, etc.). In this work, we propose to leverage external common sense knowledge, i.e. information from the real world beyond word co-occurrence, to find topics that are more coherent and more easily
interpretable by humans. We introduce the Common Sense Topic Model (CSTM), a novel and efficient approach that augments clustering with knowledge extracted from the ConceptNet knowledge graph.We evaluate this approach on several datasets alongside commonly used models using both automatic and human evaluation, and we show how it shows superior affinity to human judgement. The code for the experiments as well as the training data and human evaluation are available at https://github.com/D2KLab/CSTM.
© ACM, 2021. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in K-CAP 2021, 11th ACM Knowledge Capture Conference, 2-3 December 2021, New-York,USA (Virtual Event) http://doi.org/10.1145/3460210.3493586