Metric Optimization and Mainstream Bias Mitigation in Recommender Systems
Roger Zhe Li (TU Delft - Multimedia Computing)
A. Hanjalic – Promotor (TU Delft - Intelligent Systems)
J. Urbano Merino – Copromotor (TU Delft - Multimedia Computing)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Recommender Systems have drawn extensive attention in recent decades, because they are a powerful tool with the potential to help several business stakeholders –including end users, sellers, and platformproviders– through personalized recommendations. The most important factor to make a recommender succeed is user satisfaction, which is largely reflected by the recommendation accuracy. Therefore, one primary question in recommender systems research is how to make all users enjoy good recommendation accuracy. This thesis dives into this question from two different perspectives that, unfortunately, are at tension with each other: achieving the maximum overall recommendation accuracy, and balancing that accuracy among all users.
The first part of this thesis focuses on the first perspective, that is, maximizing the overall recommendation accuracy. This accuracy is usually evaluated with some useroriented metric tailored to the recommendation scenario, but because recommendation is usually treated as a machine learning problem, recommendation models are trained to maximize some other generic criteria that does not necessarily align with the criteria ultimately captured by the user-oriented evaluation metric. Recent research aims at bridging this gap between training and evaluation via direct ranking optimization, but still assumes that the metric used for evaluation should also be the metric used for training. We challenge this assumption, mainly because some metrics are more informative than others. Indeed, we show that models trained via the optimization of a loss inspired by Rank-Biased Precision (RBP) tend to yield higher accuracy, even when accuracy is measured with metrics other than RBP. However, the superiority of this RBP-inspired loss stems from further benefiting users who are already well-served, rather than helping those who are not.
This observation inspires the second part of this thesis, where our focus turns to helping non-mainstream users. These are users who are difficult to recommend to either because there is not enough data to model them, or because they have niche taste and thus few similar users to look at when recommending in a collaborative way. These differences in mainstreamness introduce a bias reflected in an accuracy gap between users or user groups, which we try to narrow.
Our first effort consists in using side data, beyond the user-item interaction matrix, so that users and items are better represented in the recommendation model. This will be of benefit specially for the non-mainstream users, for which the user-item matrix alone is ineffective. We propose Neural AutoEncoder Collaborative Filtering (NAECF), an adversarial learning architecture that, in addition to maximizing the recommendation accuracy, leverages side data to preserve the intrinsic properties of users and items. We experiment with review texts as side data, and show that NAECF leads to better recommendations specially for non-mainstream users, while at the same time there is a marginal loss for the mainstream ones.
Our second effort consists in explicitly signaling to the training process what users it should focus on, that is, the non-mainstream ones. In particular, we propose a mechanism based on cost-sensitive learning that weighs users according to their mainstreamness, so that they get more attention during training. Here we argue for not quantifying mainstreamness directly, but rather its effect, and therefore weigh users depending on how well they are served by a vanilla recommendation model. The result is a recommendation model tailored to non-mainstream users, that narrows the accuracy gap, and again at negligible cost to the mainstream users.