Efficient Fact-checking through Supporting Facts Extraction from Large Data Collections

More Info


Amidst the rampant spread of misinformation, fact-checking of diverse claims made on the internet has become a pertinent task to mitigate this problem. Manual fact-checking cannot scale up with this demand and is very cumbersome, therefore instead automated fact-checking can be used. However, existing work has primarily focused on the fact-verification part rather than evidence retrieval for large data collections, leading to scalability issues for practical applications. In this study, we address this gap by exploring various methods for indexing a succinct set of supporting facts extracted from large data collections and enhancing the retrieval phase of the fact-checking pipeline. Our evaluation, consisting of measuring the performance and efficiency, is performed on the state-of-the-art claim datasets HoVer and WiCE, where we utilised the English Wikipedia as a large evidence data collection. Overall our results underscore the effectiveness of integrating supporting facts and advanced retrieval techniques for fact-checking pipelines in practical applications. We achieve, through a combination of indexing supporting facts together with Dense retrieval and Index compression, a massive improvement over the original fact-checking pipeline. This is up to a 10.0x speedup using a CPU-based approach and up to a 20.0x speedup using a GPU-based approach, while only incurring a modest loss of less than 6 points in accuracy.