Pre-trained word embeddings constitute an essential building block for many NLP systems and applications, notably when labeled data is scarce. However, since they compress word meanings into a fixed-dimensional representation, their use usually lack interpretability beyond a measure of similarity and linear analogies that do not always reflect real-world word relatedness, which can be important for many NLP applications. In this paper, we propose a model which extracts topics from text documents based on the common-sense knowledge available in ConceptNet  – a semantic concept graph that explicitly encodes real-world relations between words – and without any human supervision. When combining both ConceptNet’s knowledge graph and graph embeddings, our approach outperforms other baselines in the zero-shot setting, while generating a human-understandable explanation for its predictions through the knowledge graph. We study the importance of some modeling choices and criteria for designing the model, and we demonstrate that it can be used to label data for a supervised classifier to achieve an even better performance without relying on any humanly-annotated training data. We publish the code of our approach at https://github.com/D2KLab/ZeSTE and we provide a user friendly demo at https://zeste.tools.eurecom.fr/.
Explainable zero-shot topic extraction using a common-sense knowledge graph
LDK 2021, 3rd Conference on Language, Data and Knowledge, 1-3 September 2021, Zaragoza, Spain
© Cost. Personal use of this material is permitted. The definitive version of this paper was published in LDK 2021, 3rd Conference on Language, Data and Knowledge, 1-3 September 2021, Zaragoza, Spain and is available at : http://dx.doi.org/10.4230/OASIcs.LDK.2021.17
PERMALINK : https://www.eurecom.fr/publication/6538