Directed Increment Policy Search for Behavior Tree Task Performance Optimization

None, None

Directed Increment Policy Search for Behavior Tree Task Performance Optimization

Crossing the Reality Gap

Master Thesis (2017)

Author(s)

S.A. Leest (TU Delft - Aerospace Engineering)

Contributor(s)

EJ van Kampen – Mentor

G. C. H. E. de Croon – Mentor

Kirk Y.W. Scheper – Mentor

Faculty

Aerospace Engineering

Copyright

Reinforcement Learning Behavior Tree Robotics Behavior Policy Search

To reference this document use:

https://resolver.tudelft.nl/uuid:ff167b06-bbaf-4897-b76c-9f246e50eadb

More Info

expand_more

Publication Year

2017

Language

English

Copyright

Graduation Date

04-12-2017

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering | Control & Simulation']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Robotic behavior policies learned in simulation suffer from a performance degradation once transferred to a real-world robotic platform. This performance degradation originates from discrepancies between the real-world and simulation environment, referred to as the reality gap. To cross the reality gap, this papers presents a simple reinforcement learning algorithm named Directed Increment Policy Search (DIPS). DIPS is a form of episodic model-free policy search which leverages the interpretable structure and the coupling of the Behavior Tree (BT) parameters to reduce the number of required real-world evaluations. Additionally, DIPS does not require a form of reward function crafting and is robust to hyper-parameter settings. DIPS is evaluated on a simulated model of the DelFly Explorer which is tasked to perform a window fly-through maneuver. It is demonstrated that DIPS efficiently and effectively improves the BT behavior policy performance for three simulated environments with increasingly large reality gaps. We believe DIPS can generalize to other behavior representation methods and tasks due to the inherent coupling between behavior and environment experienced by embodied robots.

Files

171120_Thesis_Report_Steven_Le... (pdf)

(pdf | 6.78 Mb)

License info not available