Learning Task Space Policies from Demonstration

Master Thesis (2021)
Author(s)

L.K. Suresh Kumar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

Carlos Celemin – Mentor (TU Delft - Learning & Autonomous Control)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Lalith Keerthan Suresh Kumar
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Lalith Keerthan Suresh Kumar
Graduation Date
14-10-2021
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this thesis, we propose a method titled "Task Space Policy Learning (TaSPL)", a novel technique that learns a generalised task/state space policy, as opposed to learning a policy in state-action space, from interactive corrections in the observation space or from state only demonstration data. This task/state space policy enables the agent to execute the task when the dynamics of the environment is changed from the original environment, without the need of additional demonstrative effort from a human teacher. We achieve this by
decoupling the objective task into a Task space policies and dynamics model. A Task space policy, describes how the observable states transit in order to reach the goal of a task and an Indirect Inverse Dynamics model, which is responsible for performing the action that obtains the desired transition. Thus, effectively decoupling the task objective from the dynamics of the environment. In case, the dynamics of the environment changes, only the agent’s dynamics model has to be relearnt, while the task space policy can be reused.
The method was tested and compared to other imitation learning methods, for various control tasks of the OpenAI Gym toolkit in their original environment. The obtained policies were also tested in the modified environments, showing that this method can be used to obtain imitation policies with the benefits of interactive IL methods, while also being able to generalize that knowledge to several varied conditions unseen during the teacher interventions. The
method was validated in two different tasks with a KUKA iiwa robot manipulator, testing generalization capabilities of the learnt policies.

Files

License info not available