LLM-augmented counterfactual explanations

None, None

LLM-augmented counterfactual explanations

Improving faithfulness and user-preference alignment

Master Thesis (2025)

Author(s)

A. Hasami (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Mansoury – Mentor (TU Delft - Multimedia Computing)

A. Hanjalic – Mentor (TU Delft - Intelligent Systems)

Maria Soledad Pera – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Counterfactual explanations Large language models Recommender systems Popularity bias Explainable recommender systems

To reference this document use:

https://resolver.tudelft.nl/uuid:346896e7-f469-4535-87bb-09310e61b039

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

20-11-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Counterfactual explanations (CFEs) offer a tangible and actionable way to explain recommendations by showing users a "what-if" scenario that demonstrates how small changes in their history would alter the system’s output. However, existing CFE methods are susceptible to bias, generating explanations that might misalign with the user's actual preferences. In this thesis, we study ACCENT, a neural CFE framework, and analyze its behavior through the lens of popularity bias. We introduce two alignment metrics, popularity distribution similarity (PDS) and expected popularity deviation (EPD), and evaluate 736 users with strongly niche- or blockbuster-oriented histories on MovieLens 1M and Amazon Video Games. Analysis shows that ACCENT’s explanations are systematically misaligned with historical user popularity preference. To address this, we propose a pre-processing step that leverages large language models to identify and filter out-of-character history items before generating explanations. Compared to simple heuristics and embedding-based filters, LLM-based filtering yields counterfactuals that are more closely aligned with each user’s popularity preferences, while preserving explanation conciseness and fidelity. A comparison between 4B and 8B parameter models further reveals that larger LLMs provide more stable, instruction-following behavior and stronger alignment, at the cost of increased computational overhead.

Files

MSc_Thesis_Arjan_Hasami.pdf

(pdf | 2.16 Mb)

License info not available