Sample efficient learning of path following and obstacle avoidance behavior for Quadrotors

None, None; None, None; None, None; None, None

Sample efficient learning of path following and obstacle avoidance behavior for Quadrotors

Journal Article (2018)

Author(s)

Stefan Stevsic (ETH Zürich)

Tobias Nägeli (ETH Zürich)

J. Alonso-Mora (TU Delft - Learning & Autonomous Control)

Otmar Hilliges (ETH Zürich)

Research Group

Learning & Autonomous Control

Copyright

DOI related publication

https://doi.org/10.1109/LRA.2018.2856922

Collision avoidance Deep learning in robotics and automation

To reference this document use:

https://resolver.tudelft.nl/uuid:22ab6b89-1eb2-4103-be6f-e310648f5a1b

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Research Group

Learning & Autonomous Control

Issue number

4

Volume number

3

Pages (from-to)

3852-3859

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this letter, we propose an algorithm for the training of neural network control policies for quadrotors. The learned control policy computes control commands directly from sensor inputs and is, hence, computationally efficient. An imitation learning algorithm produces a policy that reproduces the behavior of a supervisor. The supervisor provides demonstrations of path following and collision avoidance maneuvers. Due to the generalization ability of neural networks, the resulting policy performs local collision avoidance, while following a global reference path. The algorithm uses a time-free model-predictive path-following controller as a supervisor. The controller generates demonstrations by following few example paths. This enables an easy-to-implement learning algorithm that is robust to errors of the model used in the model-predictive controller. The policy is trained on the real quadrotor, which requires collision-free exploration around the example path. An adapted version of the supervisor is used to enable exploration. Thus, the policy can be trained from a relatively small number of examples on the real quadrotor, making the training sample efficient

Files

Sample_Efficient_Learning_of_P... (pdf)

(pdf | 2.51 Mb)

- Embargo expired in 18-01-2019

License info not available