Efficient Fact-checking through Supporting Facts Extraction from Large Data Collections

None, None

Efficient Fact-checking through Supporting Facts Extraction from Large Data Collections

Master Thesis (2024)

Author(s)

K.R. Nanhekhan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Anand – Mentor (TU Delft - Web Information Systems)

V. Viswanathan – Mentor (TU Delft - Web Information Systems)

P.K. Murukannaiah – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Supporting facts extraction Large Evidence Collections Efficient fact-checking Concise fact index Index Compression

To reference this document use:

https://resolver.tudelft.nl/uuid:990b1567-7c6e-43a1-9832-3d5a9a53c41a

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

24-04-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Amidst the rampant spread of misinformation, fact-checking of diverse claims made on the internet has become a pertinent task to mitigate this problem. Manual fact-checking cannot scale up with this demand and is very cumbersome, therefore instead automated fact-checking can be used. However, existing work has primarily focused on the fact-verification part rather than evidence retrieval for large data collections, leading to scalability issues for practical applications. In this study, we address this gap by exploring various methods for indexing a succinct set of supporting facts extracted from large data collections and enhancing the retrieval phase of the fact-checking pipeline. Our evaluation, consisting of measuring the performance and efficiency, is performed on the state-of-the-art claim datasets HoVer and WiCE, where we utilised the English Wikipedia as a large evidence data collection. Overall our results underscore the effectiveness of integrating supporting facts and advanced retrieval techniques for fact-checking pipelines in practical applications. We achieve, through a combination of indexing supporting facts together with Dense retrieval and Index compression, a massive improvement over the original fact-checking pipeline. This is up to a 10.0x speedup using a CPU-based approach and up to a 20.0x speedup using a GPU-based approach, while only incurring a modest loss of less than 6 points in accuracy.

Files

MSc_Efficient_fact_checking_Ke... (pdf)

(pdf | 17.8 Mb)

License info not available