Efficient Fact-checking through Supporting Facts Extraction from Large Data Collections

Master thesis (2024)

Authors

K.R. Nanhekhan Electrical Engineering, Mathematics and Computer Science

Contributors

A. Anand Web Information Systems - (supervisor 1)

V. Viswanathan Web Information Systems - (supervisor 1)

P.K. Murukannaiah Interactive Intelligence - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:990b1567-7c6e-43a1-9832-3d5a9a53c41a

Published Date

24-04-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Amidst the rampant spread of misinformation, fact-checking of diverse claims made on the internet has become a pertinent task to mitigate this problem. Manual fact-checking cannot scale up with this demand and is very cumbersome, therefore instead automated fact-checking can be used. However, existing work has primarily focused on the fact-verification part rather than evidence retrieval for large data collections, leading to scalability issues for practical applications. In this study, we address this gap by exploring various methods for indexing a succinct set of supporting facts extracted from large data collections and enhancing the retrieval phase of the fact-checking pipeline. Our evaluation, consisting of measuring the performance and efficiency, is performed on the state-of-the-art claim datasets HoVer and WiCE, where we utilised the English Wikipedia as a large evidence data collection. Overall our results underscore the effectiveness of integrating supporting facts and advanced retrieval techniques for fact-checking pipelines in practical applications. We achieve, through a combination of indexing supporting facts together with Dense retrieval and Index compression, a massive improvement over the original fact-checking pipeline. This is up to a 10.0x speedup using a CPU-based approach and up to a 20.0x speedup using a GPU-based approach, while only incurring a modest loss of less than 6 points in accuracy.

Files

MSc_Efficient_fact_checking_Ke... (.pdf)

(.pdf | 17.8 Mb)