Improving reasoning of Large Language Models for fact checking real - world complex claims
P. Chungkham (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Avishek Anand – Mentor (Leibniz Universität)
Kubilay Atasu – Graduation committee member (TU Delft - Data-Intensive Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The growing volume of online misinformation has increased the demand for automated fact-checking systems. While large language models (LLMs) have demonstrated potential in this domain, real-world claims are complex that challenge standard prompting techniques. These claims are multi-aspect, requiring integration of evidence across different sources. Among these, a challenging subset are claims with Conflicting labels, where different parts of the claim support opposing veracity labels. Both cases demand nuanced, context-sensitive reasoning that LLMs struggle to perform reliably. This thesis investigates two methods to enhance LLM
reasoning for fact-checking - claim decomposition and test-time scaling. Claim decomposition
breaks down complex claims into simpler sub-questions, promoting more structured reasoning. While this improves performance on Conflicting claims, it can degrade accuracy for straightforward claims, particularly those labeled as True. To mitigate this, an adaptive decomposition strategy is proposed, selectively applying decomposition only when beneficial. A taxonomy of reasoning failure - termed overthinking - is identified, where the model becomes unnecessarily strict due to noisy evidence or overly specific sub-questions. To further address this issue, test-time scaling using a reward model is employed to rank candidate outputs.
This approach yields an 18.8% relative improvement in macro F1-score over the baseline and reduces overthinking by encouraging context-aware leniency. Together, these findings underscore the importance of targeted reasoning strategies for improving the robustness and reliability of LLM-based fact-checking.