Effects of Partial Observability Solver Methods on Training and Final Policies in Autonomous Driver RL

How do different methods for dealing with partial observability in the environment influence training and the robustness of final policies under various testing conditions?

Bachelor Thesis (2023)
Author(s)

A.E. Çil (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.A. Zanger – Mentor (TU Delft - Algorithmics)

M.T.J. Spaan – Mentor (TU Delft - Algorithmics)

E. Congeduti – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Ata Çil
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Ata Çil
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Autonomous driving is a complex problem that can potentially be solved using artificial intelligence. The complexity stems from the system's need to understand the surroundings and make appropriate decisions. However, there are various challenges in constructing such a sophisticated system. One of the main challenges is to make the agent learn from the environmental input and since the environment is not fully observable and constantly changing, the agent should be flexible enough to extract important information from the input. To create such an agent this paper focused on the 3 different methods, specifically the application of frame stacking, long short-term memory (LSTM), and a combination of both methods. To analyze these methods 3 Deep Q Network(DQN) based agents are created. These 3 agents have frame stacking, LSTM, and both combined methods to solve the partially observable problem of autonomous driving. Their training performances are analyzed, and the results revealed significantly different trends in the training and evaluation phase.
Especially the experiment resulted in LSTM being a more robust method but had lower performance than DQN with frame stack, which showed a trade-off between these 2 qualities of an agent. The agent with LSTM and frame stack was able to learn faster at the beginning, but as it was unstable, it got a lower return value from the training run at the end. This instability can be a problem in the real world, especially in autonomous driving, where it is very important to have robust implementation.

Files

Final_Paper_12_.pdf
(pdf | 0.634 Mb)
License info not available