Directed Increment Policy Search for Behavior Tree Task Performance Optimization

Crossing the Reality Gap

Master Thesis (2017)
Author(s)

S.A. Leest (TU Delft - Aerospace Engineering)

Contributor(s)

EJ van Kampen – Mentor

G. C. H. E. de Croon – Mentor

Kirk Y.W. Scheper – Mentor

Faculty
Aerospace Engineering
Copyright
© 2017 Steven Leest
More Info
expand_more
Publication Year
2017
Language
English
Copyright
© 2017 Steven Leest
Graduation Date
04-12-2017
Awarding Institution
Delft University of Technology
Programme
['Aerospace Engineering | Control & Simulation']
Faculty
Aerospace Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Robotic behavior policies learned in simulation suffer from a performance degradation once transferred to a real-world robotic platform. This performance degradation originates from discrepancies between the real-world and simulation environment, referred to as the reality gap. To cross the reality gap, this papers presents a simple reinforcement learning algorithm named Directed Increment Policy Search (DIPS). DIPS is a form of episodic model-free policy search which leverages the interpretable structure and the coupling of the Behavior Tree (BT) parameters to reduce the number of required real-world evaluations. Additionally, DIPS does not require a form of reward function crafting and is robust to hyper-parameter settings. DIPS is evaluated on a simulated model of the DelFly Explorer which is tasked to perform a window fly-through maneuver. It is demonstrated that DIPS efficiently and effectively improves the BT behavior policy performance for three simulated environments with increasingly large reality gaps. We believe DIPS can generalize to other behavior representation methods and tasks due to the inherent coupling between behavior and environment experienced by embodied robots.

Files

License info not available