Combining Multi-Objective Planning with Reinforcement Learning to Solve Complex Tasks in Environments with Sparse Rewards

None, None

Combining Multi-Objective Planning with Reinforcement Learning to Solve Complex Tasks in Environments with Sparse Rewards

Master Thesis (2023)

Author(s)

C. van Rijn (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Anna Lukina – Mentor (TU Delft - Algorithmics)

Matthijs Spaan – Graduation committee member (TU Delft - Algorithmics)

F.A. Oliehoek – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Sparse rewards Planning Reinforcement learning Sequential decision making LTL tasks

To reference this document use:

https://resolver.tudelft.nl/uuid:1b0a4da5-d239-433b-b428-c927049e9055

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

21-03-2023

Awarding Institution

Delft University of Technology

Programme

['Computer Science', 'Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Sequential decision-making problems are problems where the goal is to find a sequence of actions that complete a task in an environment. A particularly difficult type of sequential decision-making problem to solve is one in which the environment has sparse rewards, a large state space, and where the goal is to complete a complex task. In this research we create a controller that can be used to solve these types of environments in cases where the task needs to be optimized for multiple objectives. We create MOPRL, an approach that combines techniques from planning, formal methods, and reinforcement learning to synthesize such a controller. W

Files

Thesis_Final_Cas_van_Rijn.pdf

(pdf | 2.01 Mb)

License info not available