Digital DATA Talk : "Modeling Knowledge Incorporation into Topic Models and their Evaluation"

Silvia Terragni - Ph.D. student in Computer Science at the University of Milan-Bicocca, Italy
Data Science

Date: Thu, 06/17/2021 - 15:00 - Thu, 06/17/2021 - 17:00
Location: Eurecom

Abstract: Topic models are statistical methods that aim at extracting the themes, or "topics", from large collections of documents. We may have some knowledge, associated with the documents (e.g. document labels, pre-trained representations) that can be exploited to improve the quality of the resulting topics. In this talk, I will review different methods to incorporate knowledge into topic models. Moreover, due to their stochastic and unsupervised nature, topic models are difficult to evaluate. Therefore, I will discuss the issues of their evaluation and show how to guarantee a fairer comparison between the models. References: - Terragni, S., Fersini, E., & Messina, E. (2020). Constrained relational topic models. Information Sciences, 512, 581-594. - Terragni, S., Nozza, D., Fersini, E., & Enza, M. (2020). Which Matters Most? Comparing the Impact of Concept and Document Relationships in Topic Models. Insights @ EMNLP 2020 (pp. 32-40). - Bianchi, F., Terragni, S., & Hovy, D. (2021). Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. ACL 2021 (to appear). - Bianchi, F., Terragni, S., Hovy, D., Nozza, D., & Fersini, E. (2020). Cross-lingual contextualized topic models with zero-shot learning. EACL 2021 (pp. 1676-1683) - Terragni, S., Fersini, E., Galuzzi, B. G., Tropeano, P., & Candelieri, A. (2021). OCTIS: Comparing and Optimizing Topic models is Simple!. In EACL 2021: System Demonstrations (pp. 263-270). Bio: Silvia Terragni is a Ph.D. student in Computer Science at the University of Milan-Bicocca, in Italy, and she is currently a virtual visiting student at EURECOM. Her research mainly involves topic modeling and NLP. She is an author of several papers accepted at the highest-ranked NLP venues (ACL, EACL, EMNLP). Data Science Seminars: https://ds.eurecom.fr/seminars/ https://mediaserver.eurecom.fr/channels/#data-science-seminars (internal)