Comparative Analysis of Techniques for Data Minimization for Recommender System algorithms

Master Thesis (2019)
Author(s)

Manoj Krishnaraj (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Martha Larson – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2019
Language
English
Graduation Date
25-11-2019
Awarding Institution
Delft University of Technology
Programme
Computer Science, Data Science and Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
165
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recommender systems (RS) often use a large amount of data for a marginal gain in performance. This thesis investigates the data minimization in Recommender Systems, which is not well studied in the literature. This thesis extends the data minimization principles advocated in GDPR and studies its effects on recommender systems. Minimizing data not only reduces storage and transmission requirements but also has the potential to improve privacy and increase training and prediction speeds. This thesis investigates the effects of reducing the amount of data used to model a recommender system. It evaluates the accuracy of the Biased Matrix Factorization (BMF) algorithm by varying the training data on the MovieLens 10 million ratings (ML-10M) dataset. In this thesis, four data minimization techniques were used. We reproduced one pervious work and proposed three new data minimization techniques. In the first technique, we confirmed previous work concerning training data analysis, where the data outside the selected temporal window were dropped. The second data minimization technique, user profile truncation, retained the recent N ratings for each of the users while truncating the historical ratings. The third technique improved the user profile truncation by selectively truncating a percentage of user's historical ratings. In the fourth technique, a long user profile was split into smaller pseudo-user profiles. Analysis of the results is conducted. The most interesting results come from the third data minimization technique. Here, we show that truncating a percentage of the least recently active long user-profiles does not damage the performance and may slightly help. 60% of the long users can truncate their profiles to 20 ratings with minimal impact on the performance. Based on the results, we conclude that a substantial amount of data can be dropped without a large impact on performance. The results hold for the ML-10M dataset. It should hold for other datasets. The privacy implications of data minimization warrant future work. The proposed techniques serve as a guide for future research in data minimization of recommender systems.

Files

License info not available