O. Azizi | TU Delft Repository

Disentangling Latent Representations in Non-Stationary Reinforcement Learning

Master thesis (2025) - R. Haket, M.T.J. Spaan, J.H. Krijthe, O. Azizi

Model-free deep reinforcement learning has shown remarkable promise in solving highly complex sequential decision-making problems. However, the widespread adoption of reinforcement learning algorithms has not materialized in real-world applications such as robotics. A primary challenge is the general assumption that environments remain stationary at deployment. This problem is exacerbated when agents rely on pixel-based observations, dramatically increasing task complexity. As a result, these algorithms often fail when environments change over time. The perspective that agents should learn a disentangled representation has already been shown to be effective in improving generalization to domain shifts. We extend previous work by introducing Generalized Disentanglement (GED), an auxiliary contrastive learning task that encourages pixel-based deep reinforcement learning algorithms to isolate factors of variation in the data by leveraging temporal structure. We show that our methodology can improve generalization to unseen domains in several environments. ...

Minimizing the Long-tail Problem in Collaborative Filtering Based Recommender Systems Using Clustering

Bachelor thesis (2022) - Y. Mundhra, F.A. Oliehoek, A.T. Czechowski, D. Mambelli, O. Azizi, D.M.J. Tax

Recommender systems are an essential part of online businesses in today's day and age. They provide users with meaningful recommendations for items and products. A frequently occurring problem in recommender systems is known as the long-tail problem. It refers to a situation in which a majority of the items in the data set have limited ratings due to which many recommender systems, especially collaborative filtering based methods, are not able to recommend these items, also known as long-tail items. Although popular items are easier to recommend, it has been noticed that long-tail items often generate a significant fraction of the revenue and therefore should also be recommended to users. This paper proposes a modified version of a collaborative filtering based recommender system aimed to reduce the effects of the long-tail recommendation problem (LTRP). The algorithm first splits the data set into the head H and the tail T and clusters the items from the tail. The average rating avg for each cluster is calculated and for all users and their unrated long-tail items, the rating for that item is set to avg with a probability of p. Now the standard collaborative filtering algorithm is run with the newly inserted ratings. The inserted ratings reduce the sparsity of the data set and therefore make it easier to recommend long-tail items. Empirical experiments on the 100K MovieLens data set indicate that the proposed algorithm recommends more long-tail items than the standard collaborative filtering algorithm, thus reducing the effects of the LTRP while maintaining the same or a slightly lower accuracy of the recommender system. ...

Evaluating Design Choices in Tripartite Graph-Based Recommender Systems to Improve Long Tail Recommendations

Bachelor thesis (2022) - Thomas Crul, F.A. Oliehoek, A.T. Czechowski, D. Mambelli, O. Azizi, D.M.J. Tax

Even though the abaility to recommend items in the long tail is one of the main strengths of recommendation systems, modern models still show decreased performance when recommending these niche items. Various bipartite and tripartite graph-based models have been proposed that are specifically tailored to solving this long tail issue. This study aims to investigate the effect of the design of the additional layer introduced by tripartite graph-based recommender systems on their performance. All options available in the MovieLens 1M dataset are evaluated on recall and diversity. Experimental results suggest that tripartite graphs based on latent information describing the users perform better than ones utilising item-based latent information, but both these options hardly outperform the baseline bipartite model. Regardless of the graph used, normalising the transition matrix is found to significantly increase performance. It is hypothesised that larger user-focused additional layers show increased diversity over smaller options when normalised. Issues regarding the reproducibility of previous research are identified and addressed, and the development of unified evaluation metrics is advocated to prevent such problems in the future. ...

Alleviating the cold-start problem by using demographic data and domain-aware similarity measure

Bachelor thesis (2022) - R.C. Kalaria, F.A. Oliehoek, A.T. Czechowski, O. Azizi, D. Mambelli, D.M.J. Tax

Recommender systems (RS) are a cornerstone for most online businesses that cater to a large customer base such as e-commerce, social network platforms and many others. RS's enable these platforms to provide tailor-made experiences to each of their customers by strategically utilizing users/items rating data or any other available data. Collaborative filtering (CF) techniques are some of the most popular and successful RS models created. However, CF techniques often suffer from the cold start (CS) problem. In particular, they struggle with complete cold start (CCS) situations in which no user/item rating history is available and incomplete cold start (ICS) situations in which only a limited amount of user/item rating history is available.
In this paper, we explore two models which utilize novel ideas to combat the CCS and ICS problems. The first model (DCF) focuses on the intelligent use of user demographic data to combat the CCS problem. The second model (PIPCF) focuses on the use of a novel domain-specific similarity measure called Proximity-Impact-Popularity (PIP) to combat the ICS problem. In addition to this, we also propose our own model (DPIP-CF) which combines these two ideas in conjunction with some of our own modifications to combat the CCS and ICS problems simultaneously.
We utilize the MovieLens data set which is a commonly available and popular dataset that is often used to test RS's. Through a series of experiments, we demonstrate the strengths of DCF and PIPCF in dealing with the CCS and ICS problems respectively. Finally, we also show that our DPIP-CF model outperforms all other models discussed in this paper and is a viable solution to dealing with the CCS and ICS problems simultaneously. ...

Adapting to Dynamic User Preferences in Recommendation Systems via Deep Reinforcement Learning

Bachelor thesis (2022) - P.L. Pantea, F.A. Oliehoek, A.T. Czechowski, D. Mambelli, O. Azizi, D.M.J. Tax

Recommender Systems play a significant part in filtering and efficiently prioritizing relevant information to alleviate the information overload problem and maximize user engagement. Traditional recommender systems employ a static approach towards learning the user's preferences, relying on logged previous interactions with the system, disregarding the sequential nature of the recommendation task and consequently, the user preference shifts occurring across interactions. In this study, we formulate the recommendation task as a slate Markov Decision Process (slate-MDP) and leverage deep reinforcement learning (DRL) to learn recommendation policies through sequential interactions and maximize user engagement over extended horizons in non-stationary environments. We construct the simulated environment with various degrees of preferential dynamics and benchmark two DRL-based algorithms: FullSlateQ, a non-decomposed full slate Q-learning based on a DQN agent, and SlateQ, which implements DQN using slate decomposition. Our findings suggest that SlateQ outperforms by 10.57% FullSlateQ in non-stationary environments and that with a moderate discount factor, the algorithms behave myopically and fail to make an appropriate tradeoff to maximize long-term user engagement. ...