Investigating the extent to which inverse reinforcement learning can learn Rrewards from noisy demonstrations

Bachelor Thesis (2023)
Author(s)

C. Perdikis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Luciano C. Siebert – Mentor (TU Delft - Interactive Intelligence)

A. Caregnato Neto – Mentor (TU Delft - Interactive Intelligence)

J.M. Weber – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Charalampos Perdikis
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Charalampos Perdikis
Graduation Date
29-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations in a Markov Decision Process (MDP). The objective is to understand the underlying intentions and behaviors of experts and derive a reward function based on their reasoning, rather than their exact actions. However, expert demonstrations can be influenced by various types of noise (e.g., from random behavior) which can affect their accuracy and effectiveness in solving the MDP. This research investigates the capability of IRL to recover reward functions from noisy demonstrations. Three types of noises, namely Random Action Noise, Random Bias Noise, and Sparse Noise, are introduced and modeled. Demonstrations are generated with these noises, and the corresponding reward functions are recovered. Comparisons are made between the noisy and optimal recovered rewards using various metrics. The results indicate that IRL exhibits certain tolerance level against Random Events and Sparse Noise, while being more vulnerable to Random Bias Noise.

Files

CSE3000_Final_Paper.pdf
(pdf | 0.501 Mb)
License info not available