Interactive Imitation Learning for Force control

Position And Stiffness Teaching with Interactive Learning

More Info
expand_more

Abstract

To generalize the use of robotics, there are a few hurdles still to take. One of these hurdles is the programming of the robots. Most robots on the market today employ position control, with a set of controller parameters tuned by an expert. This programming is quite expensive, only suited for a single task, in a single configuration, and not interaction safe. This thesis tries to solve these problems, by introducing Position And Stiffness Teaching with Interactive Learning (PASTIL) and History Aware PASTIL (HA-PASTIL), a novel interactive way of learning scalable variable impedance policies. The system is able to learn both positional reference trajectories and stiffness trajectories at the same time. PASTIL and HA-PASTIL learn these policies from positional corrections applied by a human teacher, through physical human robot interaction (pHRI). For the measurement and extraction of these corrections only the proprioception sensors of the robot are used, so no force/torque sensors are required. To learn from these corrections, the intention of the teacher is estimated, by segmenting the correction space in three parts. Each of these three parts correspond to a set of update rules for the policy, that fit the intention of a correction in that segment. In this thesis, the proposed algorithms are validated through a series of experiments with sample tasks, and compared with baseline algorithms. The main conclusions from these tests are that PASTIL and HA-PASTIL, as introduced in this thesis, outperform the baseline algorithms on task performance for all tasks and that the learned stiffness makes a positive contribution to task performance. This means that the algorithms proposed here allow for simple systems, with only proprioception sensors, to be instructed by users, instead of experts. This makes it possible for robotics to be applied at lower cost, with less expertise needed to program and operate. These algorithms, however, still have some aspects that could use further research. The most important example is that they are not yet tested on an actual robot, with physical human robot interaction. There is still quite some work left to do, but the proposed algorithms might pave the way for more, and better, algorithms that aim to learn force control behaviour form only positional corrections.