Reinforcement Learning in Block Markov Chains

Master thesis (2019)

Authors

P.G. Lagerweij Electrical Engineering, Mathematics and Computer Science

Contributors

J. Sanders Network Architectures and Services - (supervisor 1)

P.F.A. Van Mieghem Network Architectures and Services - (supervisor 2)

M.T.J. Spaan Algorithmics - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science

Clustering algorithms Optimization Algorithm Markov Decision Processes Block Markov Chain

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:0f7d7ff1-d32e-4a9e-b202-73c64809873b

Published Date

29-03-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Nowadays, reinforcement learning algorithms on Markov decision processes (MDPs) face computational issues when the state space is large. To reduce this state space of a MDP several state aggregation, or clustering, methodologies have been applied. Recently, a new clustering algorithm has been proposed that is able to cluster states from a single block Markov chain. A block Markov chain is a Markov chain with blocks in its transition matrix that correspond to clusters. Our aim was to investigate the possible combination of state aggregation in reinforcement learning on MDPs with clustering of states on a block Markov chain. First, we investigated the clustering algorithm and its properties to see its performance with different parameters and trajectory length. We compared the statistical properties of a pure Markov chain and the mixed Markov chain generated by a MDP. Afterwards, we verified the performance of the clustering algorithm on this mixed Markov chain. We proposed the BMC-MDP model that is able to model cluster based MDPs. We proposed C-PSRL, an algorithm, that consists of a single clustering step, on this newly introduced model. We compared its performance with a naïve approach and concluded that this new combined approach of clustering and MDP solving on a reduced space is a viable approach that reduces the computational complexity significantly. This research opened up the possibilities of more complex algorithms with, for example, multiple clustering steps. Moreover, if we can extend this clustering algorithm to clustering based on a state and action trajectory, this may results in an increased clustering performance and thereby enhance the performance of this general approach of optimizing on a cluster based MDP.

Files

MScThesis_PLagerweij.pdf

(.pdf | 3.64 Mb)