Interactive Learning in State-space

Enabling robots to learn from non-expert humans

More Info
expand_more

Abstract

Imitation Learning is a technique that enables programming the behavior of agents through demonstration, as opposed to manually engineering behavior. However, Imitation Learning methods require demonstration data (in the form of state-action labels) and in many scenarios, the demonstrations can be expensive to obtain or too complex for a demonstrator to execute. This lack or sub-optimality of demonstrations limits the applicability and performance of many Imitation Learning methods.

Advancements in Interactive Imitation Learning techniques however, have made it easier for demonstrators to train agents and improve their performance. These techniques involve demonstrators interacting with and guiding the agent as it performs the requisite task. This guidance is typically in the form of corrections or feedback on the current actions being executed by the agent.

In this thesis, a novel Interactive Learning technique is proposed that uses human corrective feedback in state-space to train and improve agent behavior. This technique is beneficial since providing guidance to the agent in terms of `changing its state' is often easier or more intuitive for the human demonstrator (as opposed to changing the actions being executed). For instance, in manipulation tasks using a robotic arm, it is easier for the demonstrator to provide state information such as the Cartesian position of the end-effector rather than low-level action information such as joint angles. Keeping such scenarios in mind, we propose our method titled: Teaching Imitative Policies in State-space (TIPS).

We evaluate the performance of TIPS for various control tasks as part of the OpenAI Gym toolkit as well as for a manipulation task using a KUKA LBR iiwa robotic arm. We show that through continuous improvement via feedback, agents trained using TIPS outperform the demonstrator and in-turn outperform conventional Imitation Learning agents.