General-Sum Multi-Agent Continuous Inverse Optimal Control

None, None; None, None; None, None

General-Sum Multi-Agent Continuous Inverse Optimal Control

Journal Article (2021)

Author(s)

Christian Neumeyer (TU Delft - Intelligent Vehicles, Mercedes-Benz)

Frans A. Oliehoek (TU Delft - Interactive Intelligence)

Dariu M. Gavrila (TU Delft - Intelligent Vehicles)

Research Group

Intelligent Vehicles

DOI related publication

https://doi.org/10.1109/LRA.2021.3060411

Reinforcement Learning Inverse Reinforcement Learning Learning from Demonstration

To reference this document use:

https://resolver.tudelft.nl/uuid:a5be5a49-470f-4a64-ad44-e4586beedd25

More Info

expand_more

Publication Year

2021

Language

English

Research Group

Intelligent Vehicles

Issue number

2

Volume number

6

Pages (from-to)

3429-3436

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning (IRL) algorithm that can infer the reward function in multi-Agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-Agent interactive experiments.1

Files

09357891.pdf

(pdf | 0.636 Mb)

- Embargo expired in 31-08-2021

License info not available