Deep Reinforcement Learning for Bipedal Robots

Master thesis (2017)

Authors

D. Rastogi Mechanical Engineering

Contributors

J. Kober (mentor)

I. Koryakovskiy (mentor)

M. Wisse (graduation committee member)

E. van Kampen (graduation committee member)

M. Bharatheesha (graduation committee member)

Faculty

Mechanical Engineering, Mechanical Engineering

Reinforcement Learning Deep neural networks Bipedal Walking Model learning

To reference this document use:

http://resolver.tudelft.nl/uuid:0fac495f-f87a-4a61-a80f-5f901323379a

More Info

expand_more

Published Date

2017-8

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Reinforcement Learning (RL) is a general purpose framework for designing controllers for non-linear systems. It tries to learn a controller (policy) by trial and error. This makes it highly suitable for systems which are difficult to control using conventional control methodologies, such as walking robots. Traditionally, RL has only been applicable to problems with low dimensional state space, but use of Deep Neural Networks as function approximators with RL have shown impressive results for control of high dimensional systems. This approach is known as Deep Reinforcement Learning (DRL).

A major drawback of DRL algorithms is that they generally require a large number of samples and training time, which becomes a challenge when working with real robots. Therefore, most applications of DRL methods have been limited to simulation platforms. Moreover, due to model uncertainties like friction and model inaccuracies in mass, lengths etc., a policy that is trained on a simulation model might not work directly on a real robot.

The objective of the thesis is to apply a DRL algorithm, the Deep Deterministic Policy Gradient (DDPG), for a 2D bipedal robot. The bipedal robot used for analysis is developed by the Delft BioRobotics Lab for Reinforcement Learning purposes and is known as LEO. The DDPG method is applied on a simulated model of LEO and compared with traditional RL methods like SARSA. To overcome the problem of high sample requirement when learning a policy on the real system, an iterative approach is developed in this thesis which learns a difference model and then learns a new policy with this difference model. The difference model captures the mismatch between the real robot and the simulated model.

The approach is tested for two experimental setups in simulation, an inverted pendulum problem and LEO. The difference model is able to learn a policy which is almost optimal compared to training on a real system from scratch, with only \SI{10}{\percent} of the samples required.

Files

Final_MSc_Report_Divyam_Rastog... (pdf)

(pdf | 3.29 Mb)

Unknown license