Reinforcement Learning on autonomous humanoid robots

Doctoral thesis (2012)

Authors

E. Schuitema

Contributors

P.P. Jonker (promotor)

R. Babuska (promotor)

Department

BioMechanical Engineering (Mechanical, Maritime and Materials Engineering) (TU Delft)

Reinforcement learning Robotics Robots Markov decision process Temporal difference learning

To reference this document use:

http://resolver.tudelft.nl/uuid:986ea1c5-9e30-4aac-ab66-4f3b6b6ca002

More Info

expand_more

Published Date

12-11-2012

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical, Maritime and Materials Engineering

Department

BioMechanical Engineering

Abstract

Service robots have the potential to be of great value in households, health care and other labor intensive environments. However, these environments are typically unique, not very structured and frequently changing, which makes it difficult to make service robots robust and versatile through manual programming. Having robots learn to solve tasks autonomously through interaction with the real world forms an attractive alternative. With Reinforcement Learning (RL), a system can learn to perform tasks by receiving only coarse feedback on its actions: desired behavior is reinforced by positive rewards, undesired behavior is punished by negative rewards. In this research, a bipedal walking robot named Leo was designed and built specifically to study the application of RL to real robots. Robot Leo is able to learn two basic motor control tasks: placing a foot on a step of stairs, and walking. To learn to walk, Leo receives a positive reward for moving its foot forward, and negative rewards for falling and for spending time and energy. This process takes about 5 hours of practice in simulation, as well as thousands of falls. On the real prototype, the learning time was shortened by first letting the robot observe a hand coded, sub-optimal controller, which it was quickly able to mimic and even improve in a matter of hours. Algorithmic improvements are proposed to address complications of RL on real robots, such as time delays in the control loop and large disturbances such as a sudden push. To reduce the continuous risk of damage due to the trial-and-error nature of RL, a modular approach is proposed through which the robot can coarsely but quickly learn about the risk of its behavior and learn the actual task more safely and in more detail.

Files

Schuitema-phdthesis-web.pdf

(pdf | 7.81 Mb)