- document
-
Peschl, M. (author), Zgonnikov, A. (author), Oliehoek, F.A. (author), Cavalcante Siebert, L. (author)Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We...conference paper 2022