Comparative Analysis of Techniques for Data Minimization for Recommender System algorithms

None, None

Comparative Analysis of Techniques for Data Minimization for Recommender System algorithms

Master Thesis (2019)

Author(s)

Manoj Krishnaraj (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Martha Larson – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Recommender Systems Collaborative Filtering Data Reduction

To reference this document use

https://resolver.tudelft.nl/uuid:35e19f20-6161-4755-b9a5-7714af15a840

More Info

expand_more

Publication Year

2019

Language

English

Graduation Date

25-11-2019

Awarding Institution

Delft University of Technology

Programme

Computer Science, Data Science and Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

166

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recommender systems (RS) often use a large amount of data for a marginal gain in performance. This thesis investigates the data minimization in Recommender Systems, which is not well studied in the literature. This thesis extends the data minimization principles advocated in GDPR and studies its effects on recommender systems. Minimizing data not only reduces storage and transmission requirements but also has the potential to improve privacy and increase training and prediction speeds. This thesis investigates the effects of reducing the amount of data used to model a recommender system. It evaluates the accuracy of the Biased Matrix Factorization (BMF) algorithm by varying the training data on the MovieLens 10 million ratings (ML-10M) dataset. In this thesis, four data minimization techniques were used. We reproduced one pervious work and proposed three new data minimization techniques. In the first technique, we confirmed previous work concerning training data analysis, where the data outside the selected temporal window were dropped. The second data minimization technique, user profile truncation, retained the recent N ratings for each of the users while truncating the historical ratings. The third technique improved the user profile truncation by selectively truncating a percentage of user's historical ratings. In the fourth technique, a long user profile was split into smaller pseudo-user profiles. Analysis of the results is conducted. The most interesting results come from the third data minimization technique. Here, we show that truncating a percentage of the least recently active long user-profiles does not damage the performance and may slightly help. 60% of the long users can truncate their profiles to 20 ratings with minimal impact on the performance. Based on the results, we conclude that a substantial amount of data can be dropped without a large impact on performance. The results hold for the ML-10M dataset. It should hold for other datasets. The privacy implications of data minimization warrant future work. The proposed techniques serve as a guide for future research in data minimization of recommender systems.

Files

Manoj_Krishnaraj_thesis_final.... (pdf)

(pdf | 1.25 Mb)

License info not available