Multi-expert Preference Alignment in Reinforcement Learning

Master thesis (2024)

Authors

L. Li Electrical Engineering, Mathematics and Computer Science

Contributors

L. Cavalcante Siebert Interactive Intelligence - (supervisor 1)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:bb9c9637-3b26-4d41-aad6-dfcdac928509

Published Date

06-02-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This project explores adaptation to preference shifts in Multi-objective Reinforcement Learning (MORL), with a focus on how Reinforcement Learning (RL) agents can align with the preferences of multiple experts. This alignment can occur across various scenarios featuring distinct preferences of experts or within a single scenario that experiences a shift in preferences. Unlike traditional RL, which requires retraining policies every time an individual expert's preference is introduced—resulting in high computational complexity and impracticality—this project proposes a single-policy RL algorithm named Generalized Preference-based PPO (GPB PPO). This algorithm integrates environmental information and experts' preference requirements throughout the decision-making process. By exposing the agent to diverse preference scenarios during training, it learns a policy conditional on preference and can generalize to any given preference. This method eliminates the need for explicit retraining and additional adaptation when preferences shift. The generalization and adaptation capabilities of GPB PPO are further evaluated under both stationary and non-stationary environments.

Files

Multi-expert_Preference_Alignm... (.pdf)

(.pdf | 3.17 Mb)