Hierarchical Reinforcement Learning for Spatio-temporal Planning

Master thesis (2018)

Authors

S.V. Sawant Mechanical Engineering

Contributors

M.T.J. Spaan (supervisor 1)

N. Yorke-Smith (supervisor 1)

M. Wisse (supervisor 1)

J. Sijs (supervisor 1)

Faculty

Mechanical Engineering

Reinforcement Learning Markov Decision Processes Hierarchical Reinforcement Learning Semi-Markov Decision Processes

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:5b24c89d-0f5c-40e0-86a6-dd453250860d

Published Date

28-09-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Reinforcement learning (RL) is an area of Machine Learning (ML) concerned with learning how a software-defined agent should act in an environment to maximize the rewards. Similar to many ML methods, RL suffers from the curse of dimensionality, the exponential increase in solution space with the increase in problem dimensions. Learning the hierarchy present in underlying problems, formulated using the Markov Decision Processes (MDPs) framework, may exploit inherent structure in the environment. Using the hierarchical structure, an MDP can be divided into several simpler semi-MDPs (SMDPs) having temporally extended actions. The solutions of smaller SMDPs can then be re-combined to form a solution for the original MDP. The methods for Hierarchical Reinforcement Learning (HRL) explore ways to break down the original problem into SMDPs while providing several opportunities for state and temporal abstractions. A novel algorithm for learning this hierarchical structure of a discrete-state goal-oriented Factored-MDP (FMDP) is proposed in the thesis work taking into account the causal structure of the problem domain with the use of Dynamic Bayesian Network (DBN) model. The proposed method autonomously learns the state and temporal abstractions in the problem domain and constructs a hierarchy of SMDPs using them. Such a decomposition results in decreasing the problem state dimensions to be considered for solving each SMDP and, hence, reducing the computational complexity induced due to increased dimensionality.

Files

MscThesis_Shambhuraj.pdf

(.pdf | 2.78 Mb)