I. Koryakovskiy | TU Delft Repository

Safer reinforcement learning for robotics

Doctoral thesis (2018) - Ivan Koryakovskiy

Reinforcement learning is an active research area in the fields of artificial intelligence and machine learning, with applications in control. The most important feature of reinforcement learning is its ability to learn without prior knowledge about the system. However, in the real world, reinforcement learning actions may lead to serious damage of a controlled robot or its surroundings in the absence of any prior knowledge. Safety — an often neglected factor in the reinforcement learning community — requires greater attention from researchers. Prior knowledge can increase safety during learning. At the same time, it can severely limit a possible solution set and hamper learning performance. This thesis discusses the influence of different forms of prior knowledge on learning performance and the risk to robot damage, where prior knowledge ranges from physics-based assumptions, such as the robot construction and material properties, to the knowledge of the task curriculum, or the approximate model possibly coupled with a nominal controller. ...

Combining multi-level real-time iterations of Nonlinear Model Predictive Control to realize squatting motions on Leo

Report (2018) - Manuel Kudruss, Ivan Koryakovskiy, Heike Vallery, Katja Mombaur, Christian Kirches

Today’s humanoid robots are complex mechanical systems with many degrees of freedom that are built to achieve locomotion skills comparable to humans. In order to synthesize whole-body motions, real-tme capable direct methods of optimal control are a subject of contemporary research. To this end, Nonlinear Model Predictive Control is the method of choice to realize motions on the physical robot using model-based optimal control. However, the complexity of the problem results in a high computational time that falls short of the expectations of robotic experimenters and control engineers. In this article, we show how advanced NMPC methods can be applied to improve the control rate by a factor of 10–16 up to 190Hz. This is achieved by thread-based parallelization of two controllers and by efficiently reusing control problem linearizations of the last iteration to provide fast feedback by one controller while the other controller prepares the next nonlinear step including the evaluation of the multi-body dynamics and the respective sensitivities. This way, the bottleneck of the roll-out of up to 130 ms can partly be side-stepped by repeated calls of the much faster feedback phase of ~5ms. This enables a realization of a squatting task on the actual 2D-robot Leo of Delft University of Technology, which was not possible using a conventional Nonlinear Model Predictive Control scheme. ...

Model-plant mismatch compensation using reinforcement learning

Journal article (2018) - Ivan Koryakovskiy, Manuel Kudruss, Heike Vallery, Robert Babuska, Wouter Caarls

Learning-based approaches are suitable for the control of systems with unknown dynamics. However, learning from scratch involves many trials with exploratory actions until a good control policy is discovered. Real robots usually cannot withstand the exploratory actions and suffer damage. This problem can be circumvented by combining learning with a model-based control. In this letter, we employ a nominal model-predictive controller that is impeded by the presence of an unknown model-plant mismatch. To compensate for the mismatch, we propose two approaches of combining reinforcement learning with the nominal controller. The first approach learns a compensatory control action that minimizes the same performance measure as is minimized by the nominal controller. The second approach learns a compensatory signal from a difference of a transition predicted by the internal model and an actual transition. We compare the approaches on a robot attached to the ground and performing a setpoint reaching task in simulations. We implement the better approach on the real robot and demonstrate successful learning results. ...

Evaluation of physical damage associated with action selection strategies in reinforcement learning

Journal article (2017) - Ivan Koryakovskiy, Heike Vallery, Robert Babuška, Wouter Caarls

Reinforcement learning techniques enable robots to deal with their own dynamics and with unknown environments without using explicit models or preprogrammed behaviors. However, reinforcement learning relies on intrinsically risky exploration, which is often damaging for physical systems. In the case of the bipedal walking robot Leo, which is studied in this paper, two sources of damage can be identified: fatigue of gearboxes due to backlash re-engagements, and the overall system damage due to falls of the robot. We investigate several exploration techniques and compare them in terms of gearbox fatigue, cumulative number of falls and undiscounted return. The results show that exploration with the Ornstein-Uhlenbeck (OU) process noise leads to the highest return, but at the same time it causes the largest number of falls. The Previous Action-Dependent Action (PADA) method results in drastically reduced fatigue, but also a large number of falls. The results reveal a previously unknown trade-off between the two sources of damage. Inspired by the OU and PADA methods, we propose four new action-selection methods in a systematic way. One of the proposed methods with a time-correlated noise outperforms the well-known e-greedy method in all three benchmarks. We provide guidance towards the choice of exploration strategy for reinforcement learning applications on real physical systems. ...

Benchmarking model-free and model-based optimal control

Journal article (2017) - Ivan Koryakovskiy, Manuel Kudruss, Robert Babuška, Wouter Caarls, Christian Kirches, Katja Mombaur, Johannes P. Schlöder, Heike Vallery

Model-free reinforcement learning and nonlinear model predictive control are two different approaches for controlling a dynamic system in an optimal way according to a prescribed cost function. Reinforcement learning acquires a control policy through exploratory interaction with the system, while nonlinear model predictive control exploits an explicitly given mathematical model of the system. In this article, we provide a comprehensive comparison of the performance of reinforcement learning and nonlinear model predictive control for an ideal system as well as for a system with parametric and structural uncertainties. The comparison is based on two different criteria, namely the similarity of trajectories and the resulting rewards. The evaluation of both methods is performed on a standard benchmark problem: a cart–pendulum swing-up and balance task. We first find suitable mathematical formulations and discuss the effect of the differences in the problem formulations. Then, we investigate the robustness of reinforcement learning and nonlinear model predictive control against uncertainties. The results demonstrate that nonlinear model predictive control has advantages over reinforcement learning if uncertainties can be eliminated through identification of the system parameters. Otherwise, there exists a break-even point after which model-free reinforcement learning performs better than nonlinear model predictive control with an inaccurate model. These findings suggest that benefits can be obtained by combining these methods for real systems being subject to such uncertainties. In the future, we plan to develop a hybrid controller and evaluate its performance on a real seven-degree-of-freedom walking robot. ...

Reinforcement learning of potential fields to achieve limit-cycle walking

Conference paper (2016) - D.S. Feirstein (student), Ivan Koryakovskiy, Jens Kober, Heike Vallery

Reinforcement learning is a powerful tool to derive controllers for systems where no models are available. Particularly policy search algorithms are suitable for complex systems, to keep learning time manageable and account for continuous state and action spaces. However, these algorithms demand more insight into the system to choose a suitable controller parameterization. This paper investigates a type of policy parameterization for impedance control that allows energy input to be implicitly bounded: Potential fields. In this work, a methodology for generating a potential field-constrained impedance controller via approximation of example trajectories, and subsequently improving the control policy using Reinforcement Learning, is presented. The potential field-const rained approximation is used as a policy parameterization for policy search reinforcement learning and is compared to its unconstrained counterpart. Simulations on a simple biped walking model show the learned controllers are able to surpass the potential field of gravity by generating a stable limit-cycle gait on flat ground for both parameterizations. The potential field-constrained controller provides safety with a known energy bound while performing equally well as the unconstrained policy. ...