Flexible Curriculum Learning for Deep Reinforcement Learning Agents

None, None

Flexible Curriculum Learning for Deep Reinforcement Learning Agents

Master Thesis (2018)

Author(s)

J.L. Dorscheidt (TU Delft - Aerospace Engineering)

Contributor(s)

Erik-Jan van Kampen – Mentor

Faculty

Aerospace Engineering

Copyright

Curriculum Learning Deep Reinforcement Learning Deep Deterministic Policy Gradients Flexible Function Approximators

To reference this document use:

https://resolver.tudelft.nl/uuid:6e220c8b-e5c1-4c68-bf59-1bf0b76ef8fb

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Graduation Date

23-11-2018

Awarding Institution

Delft University of Technology

Programme

Aerospace Engineering

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement Learning (RL) is a learning paradigm that learns by interacting with the environment. In practice, a RL agent needs to perform many actions to sample rewards and state transitions from their environments. Recent advances in using deep neural networks as function approximators reduce the sample complexity in very high dimensional environments. Another way of reducing the sample complexity of reinforcement learning agents is by shaping the learning path into a curriculum. Flexible curriculum learning is a way of shaping where the action and/or state dimensionality of the agent is gradually increased, in an effort to slowly increase complexity of the problem. In this research, a deep reinforcement learning architecture, Deep Deterministic Policy Gradients (DDPG), is modified to learn in a flexible way. The flexible DDPG is evaluated on a problem where no coordination is required to test basic functionality of the flexible DDPG algorithm. In addition, the flexible DDPG algorithm is evaluated on a problem where coordination is required. Results show that the flexible DDPG algorithm has a comparable sample complexity to a non-flexible approach.

Files

Finalthesis.pdf

(pdf | 16.4 Mb)

License info not available