Gradient based adversarial domain randomization

More Info
expand_more

Abstract

Recent advancements in differential simulators offer a promising approach to enhancing the sim2real transfer of reinforcement learning (RL) agents by enabling the computation of gradients of the simulator’s dynamics with respect to its parameters. However, the application of these gradients is often limited to specific scenarios. In this thesis, we address these limitations by proposing methods to obtain accurate gradients through the use of a privileged value function. This approach provides valuable insights into the effectiveness of differential gradients and demonstrates that, in certain cases, it can significantly improve sim2real performance. To illustrate this, we develop an adversary that identifies the worst-case domain parameters for a given policy using local gradients. Our experiments are conducted on the Pendulum swing-up environment. This thesis forms the basis for the exploration of further possibilities of leveraging differential simulator gradients.