Imitation learning for a robotic precision placement task

Master thesis (2014)

Authors

Contributors

R. Babuska (mentor)

J. Kuijpers (mentor)

Programme

Embedded Systems (Mechanical, Maritime and Materials Engineering) (TU Delft)

Optimization Reinforcement learning Imitation learning Policy gradient Pgpe Dynamic movement primitives Precision placement Dmp

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:86815e55-bbba-45b4-915b-6f321b485940

Published Date

11-09-2014

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Embedded Systems

Abstract

In industrial environments robots are used for various tasks. At this moment it is not feasible for companies to deploy robots for productions with a limited batch size or for products with large variations. The use of robots for such environments can become feasible through a new generation of robots and software which can adapt quickly to new situations and learn from their mistakes while being programmable without needing an expert. A concept that can enable the transition to flexible robotics is the combination of imitation learning and reinforcement learning. The purpose of imitation learning is to learn a task by generalizing from observations. The power of imitation learning is that the robot is programmed in an intuitive way while the insight of the teacher is incorporated in the execution of the task. This research studies the combination of imitation and reinforcement learning, the research is applied to an industrial use-case. The research question of this study is: "Can imitation learning be combined with reinforcement learning to achieve a successful application in an industrial robotic precision placement task?" To imitate the demonstrated trajectories, Dynamic Movement Primitives (DMPs) are used. The DMPs are used to encode the observed trajectories. DMPs can be seen as a spring-damper like system with a non-linear forcing term. The forcing term is a sum of Gaussian basis functions with each its corresponding weight. Reinforcement learning can be applied to these weights to alter the shape of the trajectory created by a DMP. Policy Gradients with Parameter based Exploration (PGPE) is used as reinforcement learning algorithm to optimize the recorded trajectories. Experiments done on a UR5 show that without the learning step, the DMPs are able to provide a trajectory that results in a successful execution of a robotic precision placement task. The experiments also show that the learning algorithm is not able to remove noise from a demonstrated trajectory or complete a partial demonstrated trajectory. Therefore it can be concluded that the PGPE algorithm is not suited for reinforcement learning in robotics in its current form. It is therefore recommended to apply a data-efficient version of the PGPE algorithm in order to achieve better learning results.

Files

Robot_imitation_learning_ITvdS... (pdf)

(pdf | 27.6 Mb)