On the road from Model-Based Dynamical Programming to Model-Free Reinforcement Learning

A sample efficient approach

Master Thesis (2023)
Author(s)

P. Mur Uribe (TU Delft - Mechanical Engineering)

Contributor(s)

P. Mohajerin Esfahani – Mentor (TU Delft - Team Peyman Mohajerin Esfahani)

M. A.S. Kolarijani – Mentor (TU Delft - Team Peyman Mohajerin Esfahani)

Faculty
Mechanical Engineering
Copyright
© 2023 Pol Mur Uribe
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Pol Mur Uribe
Graduation Date
29-03-2023
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Systems and Control']
Sponsors
ExternalOrganization
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis introduces a new method, called Mixed Iteration, for controlling Markov Decision Processes when partial information is known about the dynamics of the Markov Decision Process. The algorithm uses sampling to calculate the expectation of partially known dynamics in stochastic environments. Its goal is to lower the number of iterations and computational steps required for convergence compared to traditional model-free algorithms. By lowering the number of samples required to achieve convergence Markov Decision Processes can be controlled and trained more efficiently. Additionally, the thesis discusses how this algorithm can enhance the sample efficiency and convergence rate of Reinforcement Learning algorithms like Q-Learning. The effectiveness of the proposed method will be evaluated in standard Reinforcement Learning problems and compared with the performance of Q-learning. The results show that under certain conditions that will be discussed in the thesis, the new proposed algorithm outperforms classical algorithms in terms of sample efficiency. The study will provide insight into the field of previous partial information in Reinforcement Learning alternatives, as well as the challenges that researchers in this field continue to face.

Files

Master_Thesis_Final.pdf
(pdf | 12.7 Mb)
License info not available