Reinforcement Learning from Simulation to Real World Autonomous Driving using Digital Twin

Master thesis (2022)

Authors

K.L. Voogd Mechanical Engineering

Contributors

J. Alonso-Mora Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (supervisor 1)

Son Tong (supervisor 1)

Jean Pierre Allamaa (supervisor 1)

B. Shyrokau Intelligent Vehicles - Mechanical, Maritime and Materials Engineering (supervisor 2)

M. Mazo Team Manuel Mazo Jr - Mechanical, Maritime and Materials Engineering (supervisor 2)

Faculty

Mechanical Engineering

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:31573fc6-8138-4f64-9cfa-0edec7de1510

Published Date

22-11-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Autonomous driving is attracting growing attention because of the potential advantages it poses on safety, leisure, energy efficiency, reduced emissions, and traffic reduction. Current research is focusing on areas related to artificial intelligence to solve complex planning and decision-making tasks, object detection, or simultaneous localization and mapping.

However, training and testing these methods in the real world is rather unsafe and can be harmful to other road users. For that reason, the focus is set on developing such methods first in a simulated and controlled environment and then transferring the control strategy to the target domain. The issue with this approach is that the performance drops when the domain changes, a problem known as sim2real. In addition to this, certain methods cannot be deployed to real-time applications as the computational complexity of the techniques is too large to produce outcomes in a constrained time frame.

To date, research on reducing the sim2real for autonomous driving gap has focused on domain adaptation or domain randomization techniques, however, none has combined them with a high-fidelity vehicle dynamics model. This work aims to validate a safe transfer learning approach that reduces the reality gap in a zero-shot manner by combining the advantages of using simulated and real-world data, a digital twin of the vehicle, and a traffic scenario simulator.

To test this framework reinforcement learning agents are trained to track different paths. Further evaluation includes the influence of using a high-fidelity model and real-world data. Furthermore, the effect of the reward function on the control strategy of the learning agents is also examined. The training process of this transfer learning framework starts with virtually generated scenarios, which are simpler than their real-life counterparts and noise-free. The motion of the learning agent is simulated with the digital twin and in every episode, its parameters are randomized. Moreover, noise is added to the control action. These randomizations are useful for the controller to operate under uncertainty and avoid overfitting to model inaccuracies. Once the performance saturates, real-world logged data is included so that the learning agent adapts to the target distribution, i.e. on noise levels and driving style. After the performance stops improving, the results are tested in Model-in-the-Loop and Vehicle-in-the-Loop with the SimRod, an all-electric drive-by-wire vehicle.

The results of this study indicate that using this zero-shot transfer learning framework yields much better tracking accuracy (about 30% on average) in the physical world with respect to controllers trained only with lower-fidelity models or synthetically generated data. This study stresses that a combination of the three sim2real methods and synthetically generated and real-world data are necessary to reduce the reality gap. Additional results include the influence of the reward function design and using different reinforcement learning algorithms (proximal policy optimization and soft actor-critic).