Exploring Human-AI Synergy for Complex Claim Verification

None, None; None, None; None, None

Exploring Human-AI Synergy for Complex Claim Verification

Journal Article (2025)

Author(s)

Shubhalaxmi Mukherjee (TU Delft - Interactive Intelligence)

Catholijn M. Jonker (TU Delft - Interactive Intelligence)

Pradeep K. Murukannaiah (TU Delft - Interactive Intelligence)

Research Group

Interactive Intelligence

DOI related publication

https://doi.org/10.3233/FAIA250620

Large Language Models Claim decomposition Fact checking

To reference this document use:

https://resolver.tudelft.nl/uuid:d6688c44-25c3-42dd-9bb6-122cb277b69c

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Interactive Intelligence

Volume number

408

Pages (from-to)

2-15

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Combating widespread misinformation requires scalable and reliable fact-checking methods. Fact-checking involves several steps, including question generation, evidence retrieval, and veracity prediction. Importantly, fact-checking is well-suited to exploit hybrid intelligence since it requires both human expertise and AI’s large-scale information processing abilities. Thus, constructing an effective fact-checking pipeline requires a systematic understanding of the relative strengths and weaknesses of humans and AI in different steps of the fact-checking process. We investigate the ability of LLMs to perform the first step of the process, i.e., to generate pertinent questions for analyzing a claim. To evaluate the quality of the LLM-generated questions, we crowdsource a dataset in which 150 claims are annotated with questions (1) a novice fact-checker would ask and (2) a professional fact-checker would ask when fact-checking those claims. We study the effects of the human- and LLM-generated questions on evidence retrieval and veracity prediction. We find that LLMs are able to generate nuanced questions to verify a complex claim, but the final label prediction depends on the quality of the evidence corpus. However, the evidence collected by automated methods yields lower accuracy in the veracity prediction task than the evidence curated by experts.

Files

FAIA-408-FAIA250620.pdf

(pdf | 0.367 Mb)