Inverse Reinforcement Learning (IRL) in Presence of Risk and Uncertainty Related Cognitive Biases

To what extent can IRL learn rewards from expert demonstrations with loss and risk aversion?

Bachelor Thesis (2023)
Author(s)

M. Ikiz (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Caregnato Neto – Mentor (TU Delft - Interactive Intelligence)

Luciano C. Siebert – Mentor (TU Delft - Interactive Intelligence)

J.M. Weber – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Meric Ikiz
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Meric Ikiz
Graduation Date
29-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A key issue in Reinforcement Learning (RL) research is the difficulty of defining rewards. Inverse Reinforcement Learning (IRL) is a technique that addresses this challenge by learning the rewards from expert demonstrations. In a realistic setting, expert demonstrations are collected from humans, and it is important to acknowledge that these demonstrations can deviate from rationality due to systematic biases known as cognitive biases. One group of cognitive biases, known as risk-sensitive cognitive biases, pertains to individuals' attitudes and behaviors towards risk and uncertainty. This paper investigates the extent to which IRL can learn from demonstrations that contain risk-sensitive cognitive biases such as loss aversion and risk aversion. Modelling biases using concepts from Prospect Theory and System 1 and 2 model and using Maximum Entropy IRL algorithm, this paper concludes that IRL can recreate similar solutions to experts but inferring the underlying motivations and the interactions between them is an intricate problem that requires novel approaches.

Files

BachelorThesis.pdf
(pdf | 1.74 Mb)
License info not available