Hierarchical Clustering Based State Abstraction In Reinforcement Learning

Master Thesis (2022)
Author(s)

Y. Liu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Y. Li – Mentor (TU Delft - Algorithmics)

Matthijs T. J. Spaan – Graduation committee member (TU Delft - Algorithmics)

Simon H. Tindemans – Coach (TU Delft - Intelligent Electrical Power Grids)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Yang Liu
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Yang Liu
Graduation Date
30-08-2022
Awarding Institution
Delft University of Technology
Project
['MEDIATOR Project']
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning (RL) has grown tremendously over one and a half decades and is increasingly emerging in many real-life applications. However, the application of RL is still limited due to its low training efficiencies and surplus training cost. The sampling and computation complexity normally depends on the size of the state space and splitting the state space can distribute computation and accelerate learning. State abstraction as a form of data-centric method shrinks the state space and reduces learning time, however, it is challenged by the fact that abstraction throws away information and might result in a sub-optimal solution. In this thesis, we propose the hierarchical clustering-based state grouping (HCSG) method to split the ground state space into clusters and train multiple agents for each cluster without changing the dimension of the state space. This approach allows us to distribute computation and improves training efficiency without losing the overall performance, and was also shown to outperform baseline and other state-of-art data-centric methods.

Files

MSc_Thesis_Paper_Final.pdf
(pdf | 2.92 Mb)
License info not available