Hierarchical Reinforcement Learning for Spatio-temporal Planning

More Info
expand_more

Abstract

Reinforcement learning (RL) is an area of Machine Learning (ML) concerned with learning how a software-defined agent should act in an environment to maximize the rewards. Similar to many ML methods, RL suffers from the curse of dimensionality, the exponential increase in solution space with the increase in problem dimensions. Learning the hierarchy present in underlying problems, formulated using the Markov Decision Processes (MDPs) framework, may exploit inherent structure in the environment. Using the hierarchical structure, an MDP can be divided into several simpler semi-MDPs (SMDPs) having temporally extended actions. The solutions of smaller SMDPs can then be re-combined to form a solution for the original MDP. The methods for Hierarchical Reinforcement Learning (HRL) explore ways to break down the original problem into SMDPs while providing several opportunities for state and temporal abstractions. A novel algorithm for learning this hierarchical structure of a discrete-state goal-oriented Factored-MDP (FMDP) is proposed in the thesis work taking into account the causal structure of the problem domain with the use of Dynamic Bayesian Network (DBN) model. The proposed method autonomously learns the state and temporal abstractions in the problem domain and constructs a hierarchy of SMDPs using them. Such a decomposition results in decreasing the problem state dimensions to be considered for solving each SMDP and, hence, reducing the computational complexity induced due to increased dimensionality.