Gradient based adversarial domain randomization

Master Thesis (2024)
Author(s)

G. Koning (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Matthijs T. J. Spaan – Mentor (TU Delft - Sequential Decision Making)

J.W. Böhmer – Mentor (TU Delft - Sequential Decision Making)

D.S. van der Heijden – Mentor (TU Delft - Learning & Autonomous Control)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
04-06-2024
Awarding Institution
Delft University of Technology
Programme
Computer Science | Algorithmics
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recent advancements in differential simulators offer a promising approach to enhancing the sim2real transfer of reinforcement learning (RL) agents by enabling the computation of gradients of the simulator’s dynamics with respect to its parameters. However, the application of these gradients is often limited to specific scenarios. In this thesis, we address these limitations by proposing methods to obtain accurate gradients through the use of a privileged value function. This approach provides valuable insights into the effectiveness of differential gradients and demonstrates that, in certain cases, it can significantly improve sim2real performance. To illustrate this, we develop an adversary that identifies the worst-case domain parameters for a given policy using local gradients. Our experiments are conducted on the Pendulum swing-up environment. This thesis forms the basis for the exploration of further possibilities of leveraging differential simulator gradients.

Files

Report_final_14_06_24.pdf
(pdf | 1.14 Mb)
License info not available