Improving reasoning of Large Language Models for fact checking real - world complex claims

None, None

Improving reasoning of Large Language Models for fact checking real - world complex claims

Master Thesis (2025)

Author(s)

P. Chungkham (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Avishek Anand – Mentor (Leibniz Universität)

Kubilay Atasu – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Natural Language Processing Fact-checking Reasoning Large language model

To reference this document use:

https://resolver.tudelft.nl/uuid:ff44df03-8f7f-4b96-996a-610b9189b5e3

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

21-05-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The growing volume of online misinformation has increased the demand for automated fact-checking systems. While large language models (LLMs) have demonstrated potential in this domain, real-world claims are complex that challenge standard prompting techniques. These claims are multi-aspect, requiring integration of evidence across different sources. Among these, a challenging subset are claims with Conflicting labels, where different parts of the claim support opposing veracity labels. Both cases demand nuanced, context-sensitive reasoning that LLMs struggle to perform reliably. This thesis investigates two methods to enhance LLM
reasoning for fact-checking - claim decomposition and test-time scaling. Claim decomposition
breaks down complex claims into simpler sub-questions, promoting more structured reasoning. While this improves performance on Conflicting claims, it can degrade accuracy for straightforward claims, particularly those labeled as True. To mitigate this, an adaptive decomposition strategy is proposed, selectively applying decomposition only when beneficial. A taxonomy of reasoning failure - termed overthinking - is identified, where the model becomes unnecessarily strict due to noisy evidence or overly specific sub-questions. To further address this issue, test-time scaling using a reward model is employed to rank candidate outputs.
This approach yields an 18.8% relative improvement in macro F1-score over the baseline and reduces overthinking by encouraging context-aware leniency. Together, these findings underscore the importance of targeted reasoning strategies for improving the robustness and reliability of LLM-based fact-checking.

Files

Primakov-Chungkham-thesis-repo... (pdf)

(pdf | 6.78 Mb)

License info not available