Conflicting demonstrations in Inverse Reinforcement Learning

None, None

Conflicting demonstrations in Inverse Reinforcement Learning

Bachelor Thesis (2023)

Author(s)

R.M. Labbé (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Luciano C. Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)

A. Caregnato Neto – Mentor (TU Delft - Interactive Intelligence)

J.M. Weber – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Inverse Reinforcement Learning Maximum Entropy Conflicting data

To reference this document use:

https://resolver.tudelft.nl/uuid:8a452b02-0237-4131-b47d-92244c9916b1

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

29-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper aims to investigate the effect of conflicting demonstrations on Inverse Reinforcement Learning (IRL). IRL is a method to understand the intent of an expert, by only feeding it demonstrations of that expert, which may be a promising approach for areas such as self driving vehicles, where there are a lot of demonstrations from experts. This paper aims to investigate the effect of conflicting demonstrations on IRL. Demonstrations may not always come from the same expert or the expert may prioritize different goals at times. For example, a driver may not always do grocery shopping at the same store or they may take a slightly different route on different occasions. The results showcase a negative effect from severely conflicting demonstrations on the ability of Max Entropy IRL to recover rewards, but do show some slightly optimistic results on more than two goals.

Files

CSE3000_Final_Paper_5_.pdf

(pdf | 0.83 Mb)

License info not available