M. Mansoury
Please Note
10 records found
1
Path-Level Explainability in Knowledge Graph Recommenders
Decoding Recommendations: Insights from Knowledge Graph Paths
The analysis investigates differences in explanation characteristics across paradigms, revealing that models such as TMER achieve high recommendation accuracy through constrained path structures, whereas reinforcement learning-based models, including TPRec and PGPR, generate explanations with stronger lexical alignment to user review rationales. The study also examines explanation quality for correctly recommended (relevant) and incorrectly recommended (irrelevant) items. The results show that explanation-ground truth consistency metrics (Precision, Recall, and F1) exhibit greater disparities between correctly and incorrectly recommended items than other evaluation metrics. In RippleNet, the impact of ripple set size and explanation-oriented neighbor sampling strategies on recommendation performance and explanation quality is analyzed. Non-uniform sampling guided by temporal relevance, popularity, or diversity effectively shapes ripple sets to enhance explanation properties without significantly affecting overall recommendation accuracy.
The findings highlight trade-offs between recommendation accuracy and explanation quality, demonstrating that careful model design and sampling strategies can produce interpretable, user-aligned recommendations while maintaining high performance. These insights provide guidance for developing KGRS capable of delivering accurate predictions accompanied by semantically rich, temporally relevant, and user-preferred explanations, thereby improving transparency, trust, and user satisfaction in real-world applications. ...
The analysis investigates differences in explanation characteristics across paradigms, revealing that models such as TMER achieve high recommendation accuracy through constrained path structures, whereas reinforcement learning-based models, including TPRec and PGPR, generate explanations with stronger lexical alignment to user review rationales. The study also examines explanation quality for correctly recommended (relevant) and incorrectly recommended (irrelevant) items. The results show that explanation-ground truth consistency metrics (Precision, Recall, and F1) exhibit greater disparities between correctly and incorrectly recommended items than other evaluation metrics. In RippleNet, the impact of ripple set size and explanation-oriented neighbor sampling strategies on recommendation performance and explanation quality is analyzed. Non-uniform sampling guided by temporal relevance, popularity, or diversity effectively shapes ripple sets to enhance explanation properties without significantly affecting overall recommendation accuracy.
The findings highlight trade-offs between recommendation accuracy and explanation quality, demonstrating that careful model design and sampling strategies can produce interpretable, user-aligned recommendations while maintaining high performance. These insights provide guidance for developing KGRS capable of delivering accurate predictions accompanied by semantically rich, temporally relevant, and user-preferred explanations, thereby improving transparency, trust, and user satisfaction in real-world applications.
Algorithmic Bias in Recommender Systems
Investigating the behavior of Recommender Systems bias and fairness interventions
Bridging the Semantic-Collaborative Gap
Unified Item Quantization for LLM-based Generative Recommendation
To bridge this gap, we propose the Unified Q-Former (UQF), a novel pre-quantization fusion framework designed to explicitly integrate semantic and collaborative signals into a unified item representation before discretization. Inspired by the query-based multimodal alignment of BLIP-2, UQF employs a set of learnable queries, parallel cross-attention over pre-trained item text embeddings and graph-based collaborative embeddings (via LightGCN), and adaptive gated fusion to dynamically extract complementary information from both modalities. To ensure robustness and structure preservation, the framework is optimized using a hybrid contrastive learning objective—incorporating both structural and semantic neighbors—coupled with asymmetric modality dropout.
The resulting unified representations are quantized into discrete SIDs via residual vector quantization (RQ-VAE) and utilized as target generation tokens for a downstream LLM recommender. Extensive experiments on two real-world Amazon Review datasets (Office Products and Musical Instruments) demonstrate that UQF consistently improves state-of-the-art LC-Rec and TIGER-style generative recommendation backbones. Our framework outperforms strong traditional, sequential, and recent unified generative baselines, yielding highly interpretable, hierarchical SID structures with significantly improved semantic and collaborative consistency. ...
To bridge this gap, we propose the Unified Q-Former (UQF), a novel pre-quantization fusion framework designed to explicitly integrate semantic and collaborative signals into a unified item representation before discretization. Inspired by the query-based multimodal alignment of BLIP-2, UQF employs a set of learnable queries, parallel cross-attention over pre-trained item text embeddings and graph-based collaborative embeddings (via LightGCN), and adaptive gated fusion to dynamically extract complementary information from both modalities. To ensure robustness and structure preservation, the framework is optimized using a hybrid contrastive learning objective—incorporating both structural and semantic neighbors—coupled with asymmetric modality dropout.
The resulting unified representations are quantized into discrete SIDs via residual vector quantization (RQ-VAE) and utilized as target generation tokens for a downstream LLM recommender. Extensive experiments on two real-world Amazon Review datasets (Office Products and Musical Instruments) demonstrate that UQF consistently improves state-of-the-art LC-Rec and TIGER-style generative recommendation backbones. Our framework outperforms strong traditional, sequential, and recent unified generative baselines, yielding highly interpretable, hierarchical SID structures with significantly improved semantic and collaborative consistency.
Comparative Analysis of Recommendation Models on Scopus Data
Unveiling Patterns in Sparse Interactions for Academic Discovery
The study begins with an extensive exploratory data analysis (EDA) of 2024 Scopus interaction logs, comprising over 31 million user-item events. A novel data transformation pipeline is developed to convert implicit feedback signals, such as downloads, views, and exports, into continuous-valued preference scores that are suitable for collaborative filtering models. This enables the application of state-of-the-art algorithms despite the absence of explicit ratings.
Four recommendation models are implemented and compared: Bayesian Personalized Ranking (BPR), Factored Item Similarity Model (FISM), Light Graph Convolutional Network (LightGCN), and Knowledge Graph Attention Network (KGAT). Model evaluation is performed using both traditional offline ranking metrics (Recall@10, Precision@10, NDCG@10, MRR@10, Hit Rate@10) and a novel Large Language Model (LLM) based evaluation framework leveraging GPT-4o for semantic assessment of relevance and serendipity.
Results show that LightGCN consistently outperforms other models in both personalized and non-personalized scenarios, achieving the highest accuracy and scalability. Non-personalized recommendations remain valuable in cold-start and anonymous browsing contexts. The integration of LLM based evaluation offers deeper qualitative insights into recommendation quality, capturing semantic alignment and novelty beyond what is reflected in traditional metrics. The proposed framework demonstrates that a unified embedding based architecture can effectively serve heterogeneous recommendation needs on large-scale scholarly platforms. The methodology and findings have broader implications for the design of academic recommender systems in data sparse and mixed user environments. ...
The study begins with an extensive exploratory data analysis (EDA) of 2024 Scopus interaction logs, comprising over 31 million user-item events. A novel data transformation pipeline is developed to convert implicit feedback signals, such as downloads, views, and exports, into continuous-valued preference scores that are suitable for collaborative filtering models. This enables the application of state-of-the-art algorithms despite the absence of explicit ratings.
Four recommendation models are implemented and compared: Bayesian Personalized Ranking (BPR), Factored Item Similarity Model (FISM), Light Graph Convolutional Network (LightGCN), and Knowledge Graph Attention Network (KGAT). Model evaluation is performed using both traditional offline ranking metrics (Recall@10, Precision@10, NDCG@10, MRR@10, Hit Rate@10) and a novel Large Language Model (LLM) based evaluation framework leveraging GPT-4o for semantic assessment of relevance and serendipity.
Results show that LightGCN consistently outperforms other models in both personalized and non-personalized scenarios, achieving the highest accuracy and scalability. Non-personalized recommendations remain valuable in cold-start and anonymous browsing contexts. The integration of LLM based evaluation offers deeper qualitative insights into recommendation quality, capturing semantic alignment and novelty beyond what is reflected in traditional metrics. The proposed framework demonstrates that a unified embedding based architecture can effectively serve heterogeneous recommendation needs on large-scale scholarly platforms. The methodology and findings have broader implications for the design of academic recommender systems in data sparse and mixed user environments.
Enhancing Privacy of Course Recommendation Systems
A Privacy-Focused Matrix Factorization Approach
This thesis enhances a Homomorphic-Encryption-based recommendation protocol to support biased Matrix Factorization through two additions: data centering and vector augmentation. These modifications maintain the security guarantees of the original protocol under the semi-honest adversary model while enabling the model to incorporate user and item biases. Evaluated in the plaintext domain on the MovieLens-100k dataset, the enhanced model achieved a test RMSE of 0.9213, a notable improvement over the baseline's 0.9507, and reached the baseline’s best RMSE with only 15 training iterations instead of 145. Beyond accuracy and efficiency, separating bias terms from the student–course interaction extends the system from a simple grade predictor into a tool for academic discovery, allowing for recommendations that consider inherent compatibility, not solely predicted grades. Although demonstrated in a course-recommendation setting, the approach is applicable to any privacy-preserving recommender system, offering reduced computational costs and narrowing the accuracy gap with non-private methods. ...
This thesis enhances a Homomorphic-Encryption-based recommendation protocol to support biased Matrix Factorization through two additions: data centering and vector augmentation. These modifications maintain the security guarantees of the original protocol under the semi-honest adversary model while enabling the model to incorporate user and item biases. Evaluated in the plaintext domain on the MovieLens-100k dataset, the enhanced model achieved a test RMSE of 0.9213, a notable improvement over the baseline's 0.9507, and reached the baseline’s best RMSE with only 15 training iterations instead of 145. Beyond accuracy and efficiency, separating bias terms from the student–course interaction extends the system from a simple grade predictor into a tool for academic discovery, allowing for recommendations that consider inherent compatibility, not solely predicted grades. Although demonstrated in a course-recommendation setting, the approach is applicable to any privacy-preserving recommender system, offering reduced computational costs and narrowing the accuracy gap with non-private methods.
Fairness in Collaborative Filtering Recommender Systems
A Comparative Analysis of Trade-offs Across Model Architectures
This study investigates how collaborative filtering architectures affect both accuracy and fairness. We evaluate six models, including two non-personalized baselines, across two public datasets using a unified pipeline without fairness-specific interventions.
Our results reveal a general trade-off: models with higher accuracy often exhibit greater fairness disparities, particularly on the user side. For example, LightGCN combines strong accuracy with relatively high item-side fairness, while SLIMElastic ranks high in accuracy but worsens unfairness. However, this trade-off is not uniform across datasets; NeuMF degrades notably on sparser data.
These findings demonstrate that model architecture alone can shape fairness–accuracy trade-offs, highlighting the importance of considering dataset characteristics and model design when selecting or developing recommender systems. ...
This study investigates how collaborative filtering architectures affect both accuracy and fairness. We evaluate six models, including two non-personalized baselines, across two public datasets using a unified pipeline without fairness-specific interventions.
Our results reveal a general trade-off: models with higher accuracy often exhibit greater fairness disparities, particularly on the user side. For example, LightGCN combines strong accuracy with relatively high item-side fairness, while SLIMElastic ranks high in accuracy but worsens unfairness. However, this trade-off is not uniform across datasets; NeuMF degrades notably on sparser data.
These findings demonstrate that model architecture alone can shape fairness–accuracy trade-offs, highlighting the importance of considering dataset characteristics and model design when selecting or developing recommender systems.
Fairness and Bias in Recommender Systems
Alleviating the unfairness issue with knowledge-aware recommendation models
LLM-augmented counterfactual explanations
Improving faithfulness and user-preference alignment
This thesis introduces PopSteer, a novel post-processing strategy to analyze and mitigate popularity
bias in deep recommender systems that is also interpretable. PopSteer builds on a Sparse Autoencoder that converts dense embeddings into a sparse feature space where individual neurons align with human-readable features. PopSteer consists of 3 stages: (i) Sparse Autoencoder training (ii) Neuron analysis stage through synthetic data (iii) Neuron steering stages. In the training stage, a Sparse Autoencoder (SAE) is attached to the hidden representation of a pretrained model to generate a relatively disentangled feature space, where individual neurons correspond to features. In the neuron analysis stage, two synthetic user sets are passed through the SAE, one set favoring popular items and the other favoring unpopular items. Each neuron’s alignment with popularity is quantified from the difference in activation between the two sets using Cohen’s d. In the neuron steering stage, activations of the most biased neurons are adjusted.
Experimental results show that PopSteer consistently increases exposure fairness with only minor accuracy degradation compared to state-of-the-art baselines. Effects are stronger when synthetic sets are used in the neuron analysis stage, as opposed to real data, because they better isolate the bias by providing extreme preference patterns. Furthermore, its neuron-level analysis provides insights into how popularity bias emerges within model embeddings, and validates the interpretability of individual neurons. Sensitivity analysis demonstrates that the hyperparameters have a predictable but asymmetric effect on accuracy and fairness. The count of steered neurons matters less than selecting the right neurons, which keeps intervention focused and efficient. Results show that inference and training costs stay modest, indicating deployability. Overall, the results indicate that PopSteer provides an effective and interpretable way to reduce popularity bias in deep recommender systems, while keeping accuracy loss manageable. ...
This thesis introduces PopSteer, a novel post-processing strategy to analyze and mitigate popularity
bias in deep recommender systems that is also interpretable. PopSteer builds on a Sparse Autoencoder that converts dense embeddings into a sparse feature space where individual neurons align with human-readable features. PopSteer consists of 3 stages: (i) Sparse Autoencoder training (ii) Neuron analysis stage through synthetic data (iii) Neuron steering stages. In the training stage, a Sparse Autoencoder (SAE) is attached to the hidden representation of a pretrained model to generate a relatively disentangled feature space, where individual neurons correspond to features. In the neuron analysis stage, two synthetic user sets are passed through the SAE, one set favoring popular items and the other favoring unpopular items. Each neuron’s alignment with popularity is quantified from the difference in activation between the two sets using Cohen’s d. In the neuron steering stage, activations of the most biased neurons are adjusted.
Experimental results show that PopSteer consistently increases exposure fairness with only minor accuracy degradation compared to state-of-the-art baselines. Effects are stronger when synthetic sets are used in the neuron analysis stage, as opposed to real data, because they better isolate the bias by providing extreme preference patterns. Furthermore, its neuron-level analysis provides insights into how popularity bias emerges within model embeddings, and validates the interpretability of individual neurons. Sensitivity analysis demonstrates that the hyperparameters have a predictable but asymmetric effect on accuracy and fairness. The count of steered neurons matters less than selecting the right neurons, which keeps intervention focused and efficient. Results show that inference and training costs stay modest, indicating deployability. Overall, the results indicate that PopSteer provides an effective and interpretable way to reduce popularity bias in deep recommender systems, while keeping accuracy loss manageable.