LLM-augmented counterfactual explanations
Improving faithfulness and user-preference alignment
A. Hasami (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. Mansoury – Mentor (TU Delft - Multimedia Computing)
A. Hanjalic – Mentor (TU Delft - Intelligent Systems)
Maria Soledad Pera – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Counterfactual explanations (CFEs) offer a tangible and actionable way to explain recommendations by showing users a "what-if" scenario that demonstrates how small changes in their history would alter the system’s output. However, existing CFE methods are susceptible to bias, generating explanations that might misalign with the user's actual preferences. In this thesis, we study ACCENT, a neural CFE framework, and analyze its behavior through the lens of popularity bias. We introduce two alignment metrics, popularity distribution similarity (PDS) and expected popularity deviation (EPD), and evaluate 736 users with strongly niche- or blockbuster-oriented histories on MovieLens 1M and Amazon Video Games. Analysis shows that ACCENT’s explanations are systematically misaligned with historical user popularity preference. To address this, we propose a pre-processing step that leverages large language models to identify and filter out-of-character history items before generating explanations. Compared to simple heuristics and embedding-based filters, LLM-based filtering yields counterfactuals that are more closely aligned with each user’s popularity preferences, while preserving explanation conciseness and fidelity. A comparison between 4B and 8B parameter models further reveals that larger LLMs provide more stable, instruction-following behavior and stronger alignment, at the cost of increased computational overhead.