· · · Institutional Repository

Home · About · Disclaimer · Terms of use ·

Reinforcement Learning on autonomous humanoid robots

Cite or link this publication as: doi:10.4233/uuid:986ea1c5-9e30-4aac-ab66-4f3b6b6ca002
Author: Schuitema, E.
Promotor: Jonker, P.P. · Babuska, R.
Faculty:Mechanical, Maritime and Materials Engineering
Department:BioMechanical Engineering
ISBN: 9789461860750
Keywords: robotics · robots · reinforcement learning · markov decision process · temporal difference learning
Rights: (c) 2012 Schuitema, E.


Service robots have the potential to be of great value in households, health care and other labor intensive environments. However, these environments are typically unique, not very structured and frequently changing, which makes it difficult to make service robots robust and versatile through manual programming. Having robots learn to solve tasks autonomously through interaction with the real world forms an attractive alternative. With Reinforcement Learning (RL), a system can learn to perform tasks by receiving only coarse feedback on its actions: desired behavior is reinforced by positive rewards, undesired behavior is punished by negative rewards.

In this research, a bipedal walking robot named Leo was designed and built specifically to study the application of RL to real robots. Robot Leo is able to learn two basic motor control tasks: placing a foot on a step of stairs, and walking. To learn to walk, Leo receives a positive reward for moving its foot forward, and negative rewards for falling and for spending time and energy. This process takes about 5 hours of practice in simulation, as well as thousands of falls. On the real prototype, the learning time was shortened by first letting the robot observe a hand coded, sub-optimal controller, which it was quickly able to mimic and even improve in a matter of hours. Algorithmic improvements are proposed to address complications of RL on real robots, such as time delays in the control loop and large disturbances such as a sudden push. To reduce the continuous risk of damage due to the trial-and-error nature of RL, a modular approach is proposed through which the robot can coarsely but quickly learn about the risk of its behavior and learn the actual task more safely and in more detail.

Content Viewer