Learning Variable Impedance Control

A Model-Based Approach Using Gaussian Processes

More Info
expand_more

Abstract

Modern robotic systems are increasingly expected to interact with unstructured and unpredictable environments. This has reiterated the importance of sophisticated reasoning and adaptive motor skill learning. Although low-level methodologies for sensorimotor control have been relatively well studied, constrained motion for robotic manipulators in general environments still remains a challenge. In addition to viable kinematic trajectories, successful interaction requires a system to adjust the magnitude and direction of the force applied on an object or human being. One force control architecture that is stable under a wide range of conditions is impedance control. However the defining inertial, damping and stiffness parameters are highly task dependent and often difficult to deduce a priori. To this end, one promising strategy is through reinforcement learning. Several frameworks have already emerged that are capable of learning compliant behaviour in this fashion. However the complex and sometimes discontinuous nature of physical interaction in robotics provides additional challenges in designing algorithms capable of learning complex behaviours with minimal interaction time. This thesis details an extension to the PILCO algorithm for learning variable impedance control. The proposed method attempts to construct a Gaussian Process model of the robot-environment system through interaction. This approach permits long-term inference and planning in a fully Bayesian manner, reducing the required interaction time for convergence and allow for efficient analytical gradient-based policy updates. Two skill learning problems are investigated both in simulation and experiment on a 7-DOF `KUKA LBR iiwa 7 R800' robot. The first scenario entails learning state-conditioned variable stiffness parameters along predefined motion plans. In the second framework, the agent learns both stiffness parameters and kinematic trajectories simultaneously to complete a constrained motion task.