Adapting to Dynamic User Preferences in Recommendation Systems via Deep Reinforcement Learning

None, None

Adapting to Dynamic User Preferences in Recommendation Systems via Deep Reinforcement Learning

Bachelor Thesis (2022)

Author(s)

P.L. Pantea (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Frans A Oliehoek – Mentor (TU Delft - Interactive Intelligence)

Aleksander Czechowski – Mentor (TU Delft - Interactive Intelligence)

D. Mambelli – Mentor (TU Delft - Interactive Intelligence)

O. Azizi – Mentor (TU Delft - Algorithmics)

DMJ Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Reinforcement Learning Recommender Systems User Modelling

To reference this document use:

https://resolver.tudelft.nl/uuid:9e3e4b62-1056-4d23-b48f-4acb0a708290

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

24-06-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recommender Systems play a significant part in filtering and efficiently prioritizing relevant information to alleviate the information overload problem and maximize user engagement. Traditional recommender systems employ a static approach towards learning the user's preferences, relying on logged previous interactions with the system, disregarding the sequential nature of the recommendation task and consequently, the user preference shifts occurring across interactions. In this study, we formulate the recommendation task as a slate Markov Decision Process (slate-MDP) and leverage deep reinforcement learning (DRL) to learn recommendation policies through sequential interactions and maximize user engagement over extended horizons in non-stationary environments. We construct the simulated environment with various degrees of preferential dynamics and benchmark two DRL-based algorithms: FullSlateQ, a non-decomposed full slate Q-learning based on a DQN agent, and SlateQ, which implements DQN using slate decomposition. Our findings suggest that SlateQ outperforms by 10.57% FullSlateQ in non-stationary environments and that with a moderate discount factor, the algorithms behave myopically and fail to make an appropriate tradeoff to maximize long-term user engagement.

Files

Research_Project_Report_Luca_P... (pdf)

(pdf | 1.79 Mb)

License info not available