Employing transformers and humans for textual-claim verification

Saeed, Mohammed

Thesis

Throughout the last years, there has been a surge in false news spreading across the public. Despite efforts made in alleviating "fake news", there remains a lot of ordeals when trying to build automated fact-checking systems, including the four we discuss in this thesis. First, it is not clear how to bridge the gap between input textual claims, which are to be verified, and structured data that is to be used for claim verification. We take a step in this direction by introducing Scrutinizer, a data-driven fact-checking system that translates textual claims to SQL queries, with the aid of a human-machine interaction component. Second, we enhance reasoning capabilities of pre-trained language models (PLMs) by introducing RuleBert, a PLM that is fine-tuned on data coming from logical rules. Third, PLMs store vast information; a key resource in fact-checking applications. Still, it is not clear how to efficiently access them. Several works try to address this limitation by searching for optimal prompts or relying on external data, but they do not put emphasis on the expected type of the output. For this, we propose Type Embeddings (TEs), additional input embeddings that encode the desired output type when querying PLMs. We discuss how to compute a TE, and provide several methods for analysis. We then show a boost in performance for the LAMA dataset and promising results for text detoxification. Finally, we analyze the BirdWatch program, a community-driven approach to fact-checking tweets. All in all, the work in this thesis aims at a better understanding of how machines and humans could aid in reinforcing and scaling manual fact-checking.

Detail

Document

HAL

BIBTEX

Type:

Thesis

Date:

2022-11-07

Department:

Data Science

Eurecom Ref:

7052