Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated promising performance in fact-checking tasks, particularly in labeling the veracity of claims. However, the real-world utility of such fact-checking systems depends not only on label accuracy but also on the f
...
Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated promising performance in fact-checking tasks, particularly in labeling the veracity of claims. However, the real-world utility of such fact-checking systems depends not only on label accuracy but also on the faithfulness of the justifications they provide. Prior work has explored various prompting strategies to elicit reasoning from LLMs, but most studies evaluate these styles in isolation or focus solely on veracity classification, neglecting the impact on explanation quality. This study addresses that gap by investigating how different prompt styles affect both the accuracy and the faithfulness of LLM-generated claim labelling and justifications. Seven established prompting strategies such as Chain-of-Thought, Role-Based, or Decompose-and-Verify, were tested across two datasets (QuanTemp and HoVer) using two efficient models: LLaMA 3.1:8B and GPT-4o-mini. Additionally, two novel prompt variants were introduced and all styles were tested under three label conditions to assess bias and explanation drift.