Opening the Black Box: Interpretable Remedies for Popularity Bias in Recommender Systems
P. Ahmadov (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. Mansoury – Mentor (TU Delft - Multimedia Computing)
A. Hanjalic – Mentor (TU Delft - Intelligent Systems)
P.K. Murukannaiah – Mentor (TU Delft - Interactive Intelligence)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Popularity bias is a long-standing challenge in recommender systems, where a small set of highly popular items dominates recommendations, while the majority of less popular items are overlooked. This imbalance undermines fairness, decreases recommendation diversity, and negatively impacts users’ ability to discover novel or niche content. Although existing mitigation methods address this issue to some extent, they often lack transparency in how they operate - they fix the symptoms without exposing the internal mechanisms that generate the bias, which makes the effects of bias difficult to interpret or control. As modern recommendation models increasingly rely on deep learning based architectures, which are inherently hard to interpret, this opacity has become a fundamental limitation.
This thesis introduces PopSteer, a novel post-processing strategy to analyze and mitigate popularity
bias in deep recommender systems that is also interpretable. PopSteer builds on a Sparse Autoencoder that converts dense embeddings into a sparse feature space where individual neurons align with human-readable features. PopSteer consists of 3 stages: (i) Sparse Autoencoder training (ii) Neuron analysis stage through synthetic data (iii) Neuron steering stages. In the training stage, a Sparse Autoencoder (SAE) is attached to the hidden representation of a pretrained model to generate a relatively disentangled feature space, where individual neurons correspond to features. In the neuron analysis stage, two synthetic user sets are passed through the SAE, one set favoring popular items and the other favoring unpopular items. Each neuron’s alignment with popularity is quantified from the difference in activation between the two sets using Cohen’s d. In the neuron steering stage, activations of the most biased neurons are adjusted.
Experimental results show that PopSteer consistently increases exposure fairness with only minor accuracy degradation compared to state-of-the-art baselines. Effects are stronger when synthetic sets are used in the neuron analysis stage, as opposed to real data, because they better isolate the bias by providing extreme preference patterns. Furthermore, its neuron-level analysis provides insights into how popularity bias emerges within model embeddings, and validates the interpretability of individual neurons. Sensitivity analysis demonstrates that the hyperparameters have a predictable but asymmetric effect on accuracy and fairness. The count of steered neurons matters less than selecting the right neurons, which keeps intervention focused and efficient. Results show that inference and training costs stay modest, indicating deployability. Overall, the results indicate that PopSteer provides an effective and interpretable way to reduce popularity bias in deep recommender systems, while keeping accuracy loss manageable.