Print Email Facebook Twitter Multi-expert Preference Alignment in Reinforcement Learning Title Multi-expert Preference Alignment in Reinforcement Learning Author Li, LITIAN (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Cavalcante Siebert, L. (mentor) Degree granting institution Delft University of Technology Programme Computer Science Date 2024-02-06 Abstract This project explores adaptation to preference shifts in Multi-objective Reinforcement Learning (MORL), with a focus on how Reinforcement Learning (RL) agents can align with the preferences of multiple experts. This alignment can occur across various scenarios featuring distinct preferences of experts or within a single scenario that experiences a shift in preferences. Unlike traditional RL, which requires retraining policies every time an individual expert's preference is introduced—resulting in high computational complexity and impracticality—this project proposes a single-policy RL algorithm named Generalized Preference-based PPO (GPB PPO). This algorithm integrates environmental information and experts' preference requirements throughout the decision-making process. By exposing the agent to diverse preference scenarios during training, it learns a policy conditional on preference and can generalize to any given preference. This method eliminates the need for explicit retraining and additional adaptation when preferences shift. The generalization and adaptation capabilities of GPB PPO are further evaluated under both stationary and non-stationary environments. Subject Reinforcement LearningAdaptationMulti-Objective Decision-Making To reference this document use: http://resolver.tudelft.nl/uuid:bb9c9637-3b26-4d41-aad6-dfcdac928509 Part of collection Student theses Document type master thesis Rights © 2024 LITIAN Li Files PDF Multi-expert_Preference_A ... ian_Li.pdf 3.17 MB Close viewer /islandora/object/uuid:bb9c9637-3b26-4d41-aad6-dfcdac928509/datastream/OBJ/view