Learning Task Space Policies from Demonstration

None, None

Learning Task Space Policies from Demonstration

Master Thesis (2021)

Author(s)

L.K. Suresh Kumar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

Carlos Celemin – Mentor (TU Delft - Learning & Autonomous Control)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Imitation Learning Robot Control Learning from Demonstrations

To reference this document use:

https://resolver.tudelft.nl/uuid:bf30ca88-496b-4b18-b5a5-61dedb1b2a63

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

14-10-2021

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this thesis, we propose a method titled "Task Space Policy Learning (TaSPL)", a novel technique that learns a generalised task/state space policy, as opposed to learning a policy in state-action space, from interactive corrections in the observation space or from state only demonstration data. This task/state space policy enables the agent to execute the task when the dynamics of the environment is changed from the original environment, without the need of additional demonstrative effort from a human teacher. We achieve this by
decoupling the objective task into a Task space policies and dynamics model. A Task space policy, describes how the observable states transit in order to reach the goal of a task and an Indirect Inverse Dynamics model, which is responsible for performing the action that obtains the desired transition. Thus, effectively decoupling the task objective from the dynamics of the environment. In case, the dynamics of the environment changes, only the agent’s dynamics model has to be relearnt, while the task space policy can be reused.
The method was tested and compared to other imitation learning methods, for various control tasks of the OpenAI Gym toolkit in their original environment. The obtained policies were also tested in the modified environments, showing that this method can be used to obtain imitation policies with the benefits of interactive IL methods, while also being able to generalize that knowledge to several varied conditions unseen during the teacher interventions. The
method was validated in two different tasks with a KUKA iiwa robot manipulator, testing generalization capabilities of the learnt policies.

Files

Lalith_Keerthan_MSc_Thesis_TaS... (pdf)

(pdf | 6.8 Mb)

License info not available