Learning Task Space Policies from Demonstration

More Info
expand_more

Abstract

In this thesis, we propose a method titled "Task Space Policy Learning (TaSPL)", a novel technique that learns a generalised task/state space policy, as opposed to learning a policy in state-action space, from interactive corrections in the observation space or from state only demonstration data. This task/state space policy enables the agent to execute the task when the dynamics of the environment is changed from the original environment, without the need of additional demonstrative effort from a human teacher. We achieve this by
decoupling the objective task into a Task space policies and dynamics model. A Task space policy, describes how the observable states transit in order to reach the goal of a task and an Indirect Inverse Dynamics model, which is responsible for performing the action that obtains the desired transition. Thus, effectively decoupling the task objective from the dynamics of the environment. In case, the dynamics of the environment changes, only the agent’s dynamics model has to be relearnt, while the task space policy can be reused.
The method was tested and compared to other imitation learning methods, for various control tasks of the OpenAI Gym toolkit in their original environment. The obtained policies were also tested in the modified environments, showing that this method can be used to obtain imitation policies with the benefits of interactive IL methods, while also being able to generalize that knowledge to several varied conditions unseen during the teacher interventions. The
method was validated in two different tasks with a KUKA iiwa robot manipulator, testing generalization capabilities of the learnt policies.