Online misinformation is a major problem in current society, as the flow of shared information is increasing. Misinformation has impacted topics such as health (COVID-19 ``infodemic''), politics (US elections, Brexit) or environment (climate change denial). While there are dedicated initiatives for fact-checking, they face complex challenges. Misinformation is easy to create and spread fast, but fact-checking is time consuming and does not scale well. Often times, fact-checkers have to focus their effort on the most viral claims. In our work, we present research that could help fact-checkers in their ability to verify online content.
We first propose automatic approaches to extract several textual features from social media posts. We detect COVID-19 related conspiracy theories and persuasion techniques from tweets and memes using BERT-based models, reaching state-of-the-art results.
We also detect emotion, sentiment and political-leaning in social media posts, allowing an in-depth analysis of the social discourse around COVID-19. We also study ``Tropes'', easily recognizable devices used in narratives to convey a specific theme or idea. We annotate tweets regarding 9 different tropes around vaccine and immigration topics and propose automatic models to detect them. To understand how all the textual features, relate to one another, we perform correlations analysis between them.
As large language models are becoming prominent in natural language processing research, we analyze their ability to detect conspiracy theories and persuasion techniques, and explore the impact of definitions of the class labels in the annotation performance. We find that better definitions lead to better results, and propose a way to generate good performing definitions.
Analyzing the relationship between misinformation, fact-checking and the information ecosystem is essential to understand the spread of misinformation. These research topics rely on different data source, from social media posts, news articles or claims, with different metadata attached. We introduce Cimple KG, a continuously updated public knowledge graph of misinformation-related content. Cimple KG links various previously published static misinformation datasets with daily updated claims verification from vetted fact-checking organizations and augments them with additional information such as named entities and textual features.
Lastly, we explore novel textual similarity measures, focusing on fact-checking applications. Notably, we use narratives, entities and compare documents with different length.
In summary, our work aims at using natural language processing tools and knowledge graphs to help fact-checkers in their task. We present many approaches to detect textual features, propose different novel similarity measures, as well as releasing Cimple KG, a useful resource supporting misinformation research.