Sample-efficient reinforcement learning for quadcopter flight control
L.T.J. Koomen (TU Delft - Aerospace Engineering)
EJ van Kampen – Mentor (TU Delft - Control & Simulation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The combination of reinforcement learning and deep neural networks has the potential to train intelligent autonomous agents on high dimensional sensory inputs, with applications in flight control. However, the amount of samples needed by these methods is often too large to use real-world interaction. In this work, mirror-descent guided policy search is identified as a promising algorithm to train high-dimensional policies on real-world samples. Several experiments are conducted to investigate how the use of expert-demonstrations can further improve the sample-efficiency of this algorithm when applied to the control of a quadcopter in simulation. It is shown how demonstrations, when combined with certain alterations in the mirror descent guided policy search algorithm, can significantly reduce the amount of samples needed to achieve good performance. Additionally, it is shown how these improvements are robust to sub-optimal demonstrations.