Data masking for recommender systems

None, None; None, None; None, None

Data masking for recommender systems

Prediction performance and rating hiding

Conference Paper (2019)

Author(s)

Manel Slokom (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Martha Larson (TU Delft - Electrical Engineering, Mathematics and Computer Science, Radboud Universiteit Nijmegen)

Alan Hanjalic (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Multimedia Computing

Recommender systems Data masking Privacy-preserving data publishing

URL related publication

http://ceur-ws.org/Vol-2431/ Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:e4e7ef06-fadc-4d3f-89f2-95e006386aec

More Info

expand_more

Publication Year

2019

Language

English

Research Group

Multimedia Computing

Pages (from-to)

21-25

Event

2019 ACM Conference on Recommender Systems Late-breaking Results, ACM RecSys LBR 2019 (2019-09-16 - 2019-09-20), Copenhagen, Denmark

Downloads counter

330

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Data science challenges allow companies, and other data holders, to collaborate with the wider research community. In the area of recommender systems, the potential of such challenges to move forward the state of the art is limited due to concerns about releasing user interaction data. This paper investigates the potential of privacy-preserving data publishing for supporting recommender system challenges. We propose a data masking algorithm, Shuffle-NNN, with two steps: Neighborhood selection and value swapping. Neighborhood selection preserves valuable item similarity information. The data shuffling technique hides (i.e., changes) ratings of users for individual items. Our experimental results demonstrate that the relative performance of algorithms, which is the key property that a data science challenge must measure, is comparable between the original data and the data masked with Shuffle-NNN.

Files

Paper5.pdf

(pdf | 1.15 Mb)