Alleviating the cold-start problem by using demographic data and domain-aware similarity measure

Bachelor Thesis (2022)
Author(s)

R.C. Kalaria (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F.A. Oliehoek – Mentor (TU Delft - Interactive Intelligence)

Aleksander Czechowski – Mentor (TU Delft - Interactive Intelligence)

O. Azizi – Mentor (TU Delft - Algorithmics)

D. Mambelli – Mentor (TU Delft - Interactive Intelligence)

DMJ Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Rahul Kalaria
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Rahul Kalaria
Graduation Date
24-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recommender systems (RS) are a cornerstone for most online businesses that cater to a large customer base such as e-commerce, social network platforms and many others. RS's enable these platforms to provide tailor-made experiences to each of their customers by strategically utilizing users/items rating data or any other available data. Collaborative filtering (CF) techniques are some of the most popular and successful RS models created. However, CF techniques often suffer from the cold start (CS) problem. In particular, they struggle with complete cold start (CCS) situations in which no user/item rating history is available and incomplete cold start (ICS) situations in which only a limited amount of user/item rating history is available.
In this paper, we explore two models which utilize novel ideas to combat the CCS and ICS problems. The first model (DCF) focuses on the intelligent use of user demographic data to combat the CCS problem. The second model (PIPCF) focuses on the use of a novel domain-specific similarity measure called Proximity-Impact-Popularity (PIP) to combat the ICS problem. In addition to this, we also propose our own model (DPIP-CF) which combines these two ideas in conjunction with some of our own modifications to combat the CCS and ICS problems simultaneously.
We utilize the MovieLens data set which is a commonly available and popular dataset that is often used to test RS's. Through a series of experiments, we demonstrate the strengths of DCF and PIPCF in dealing with the CCS and ICS problems respectively. Finally, we also show that our DPIP-CF model outperforms all other models discussed in this paper and is a viable solution to dealing with the CCS and ICS problems simultaneously.

Files

License info not available