Effects of action space discretization and DQN extensions on algorithm robustness and efficiency

None, None

Effects of action space discretization and DQN extensions on algorithm robustness and efficiency

How do the discretization of the action space and various extensions to the well-known DQN algorithm influence training and the robustness of final policies under various testing conditions?

Bachelor Thesis (2023)

Author(s)

M.A. Sözüdüz (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. T.J. Spaan – Mentor (TU Delft - Algorithmics)

M.A. Zanger – Mentor (TU Delft - Algorithmics)

E. Congeduti – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Reinforcement Learning (RL) Deep Q Network DQN Extensions Action Space Discretization Automated driving systems

To reference this document use:

https://resolver.tudelft.nl/uuid:1cb69801-2c42-434d-96b4-3984a2c19d6f

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement Learning (RL) has gained atten-tion as a way of creating autonomous agents for self-driving cars. This paper explores the adap- tation of the Deep Q Network (DQN), a popular deep RL algorithm, in the Carla traffic simulator for autonomous driving. It investigates the influ- ence of action space discretization and DQN ex-
tensions on training performance and robustness. Results show that action space discretization en- hances behaviour consistency but negatively af- fects Q-values, training performance, and robust- ness. Double Q-Learning decreases training per- formance and leads to suboptimal convergence, re- ducing robustness. Prioritized Experience Replay
also performs worse during training, but consis-tently outperforms in robustness testing, reward es-timation and generalization.

Files

Research.pdf

(pdf | 0.679 Mb)

License info not available