Reinforcement Learning Across Timescales

None, None

Reinforcement Learning Across Timescales

Master Thesis (2017)

Author(s)

S. Ravi (TU Delft - Mechanical Engineering)

Contributor(s)

J. Kober – Mentor

Tim de Bruin – Mentor

Jan-Willem Van Van Wingerden – Graduation committee member

S. Boersma – Graduation committee member

Faculty

Mechanical Engineering

Copyright

Reinforcement Learning Machine Learning Artificial intelligence Autonomous Control Q-Learning Control systems Deep neural networks Sampling Learning by doing

To reference this document use:

https://resolver.tudelft.nl/uuid:cea64928-6d5e-4342-8517-546c8fc2e551

More Info

expand_more

Publication Year

2017

Language

English

Copyright

Graduation Date

18-08-2017

Awarding Institution

Delft University of Technology

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This project addresses a fundamental problem faced by many reinforcement learning agents. Commonly used reinforcement learning agents can be seen to have deteriorating performances at increasing frequencies, as they are unable to correctly learn the ordering of expected returns for actions that are applied. We call this the disappearing reinforcements problem. Moreover, truly multi-task reinforcement learning is only possible when agents are able to operate across frequencies, as different platforms operate at different frequencies. Most algorithms from control theory working on similar tasks, on the other hand, show improved performances when their operating frequencies are increased. This suggests that addressing disappearing reinforcements should enable reinforcement learning agents to have improved performance and generalization ability across timescales and tasks.

In this project, we show that disappearing reinforcements is an effect seen independent of the function approximator used in reinforcement learning, and is instead of a more fundamental nature. We explore both theoretically and empirically the relationship between agents and their performances at increasing frequencies. We show that two specific types of agents from literature address the problem, and we benchmark the agents' performances with novel benchmarking measures inspired from control theory. Finally, we create a novel agent we call the dueling advantage learner, by combining both approaches from the state-of-the-art. We then benchmark the different agents across frequencies and tasks, and our agent is seen to outperform each of the individual approaches on the majority of the tasks.

Files

MasterThesis_Siddharth_Ravi.pd... (pdf)

(pdf | 1.38 Mb)

- Embargo expired in 31-08-2017

License info not available