Robotic Auxiliary Losses for continuous reinforcement learning

Master Thesis (2018)
Author(s)

T. Cherici (TU Delft - Mechanical Engineering)

Contributor(s)

Thomas M. Moerland – Mentor

Pieter Jonker – Mentor

Faculty
Mechanical Engineering
Copyright
© 2018 Teo Cherici
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 Teo Cherici
Graduation Date
27-08-2018
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering | Biomechanical Design - BioRobotics
Related content

GitHub repository of Robotic Auxiliary Losses

https://github.com/TCherici/baselines
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recent advancements in computation power and artificial intelligence have allowed the creation of advanced reinforcement learning models which could revolutionize, between others, the field of robotics. As model and environment complexity increase, however, training solely through the feedback of environment reward becomes more difficult. From the work on robotic priors by
R.Jonschkowski et al. we present robotic auxiliary losses for continuous reinforcement learning models. These function as additional feedback based on physics principles such as Newton’s laws of motion, to be utilized by the reinforcement learning model during training in robotic environments. We furthermore explore the issues of concurrent optimization on several losses and present a continuous loss normalization method for the balancing of training effort between main and auxiliary losses. In all continuous robotic environments tested, individual robotic auxiliary losses show consistent improvement over the base reinforcement learning model. The joint application of all losses during training however did not always guarantee performance improvements, as the concurrent optimization of several losses of different nature proved to be difficult.

Files

License info not available