Adapting to Dynamic User Preferences in Recommendation Systems via Deep Reinforcement Learning

More Info
expand_more

Abstract

Recommender Systems play a significant part in filtering and efficiently prioritizing relevant information to alleviate the information overload problem and maximize user engagement. Traditional recommender systems employ a static approach towards learning the user's preferences, relying on logged previous interactions with the system, disregarding the sequential nature of the recommendation task and consequently, the user preference shifts occurring across interactions. In this study, we formulate the recommendation task as a slate Markov Decision Process (slate-MDP) and leverage deep reinforcement learning (DRL) to learn recommendation policies through sequential interactions and maximize user engagement over extended horizons in non-stationary environments. We construct the simulated environment with various degrees of preferential dynamics and benchmark two DRL-based algorithms: FullSlateQ, a non-decomposed full slate Q-learning based on a DQN agent, and SlateQ, which implements DQN using slate decomposition. Our findings suggest that SlateQ outperforms by 10.57% FullSlateQ in non-stationary environments and that with a moderate discount factor, the algorithms behave myopically and fail to make an appropriate tradeoff to maximize long-term user engagement.