Reinforcement learning (RL) agents often achieve impressive results in simulation but can fail catastrophically when facing small deviations at deployment time. In this work, we examine the brittleness of Proximal Policy Optimization (PPO) agents when subjected to test-time obser
...
Reinforcement learning (RL) agents often achieve impressive results in simulation but can fail catastrophically when facing small deviations at deployment time. In this work, we examine the brittleness of Proximal Policy Optimization (PPO) agents when subjected to test-time observation noise and evaluate techniques for improving robustness. We compare four variants—feed-forward PPO, Recurrent PPO (with LSTM memory), Noisy-PPO (trained with injected observation noise), and Recurrent-Noisy PPO—across two benchmarks: the classic CartPole-v1 and the more realistic Highway-env. Performance is measured over 100 episodes per corruption level, using mean return, success rate, and the Area-Under-Degradation-Curve (AUDC) as robustness metrics. Our results show that noise-augmented training yields the largest gains, with Noisy-PPO maintaining its clean-condition performance even at high noise levels, while recurrence alone offers more modest improvement. In the Highway-env, both noise injection and LSTM memory improve returns, indicating that a simple integration of noise augmentation or recurrence can enhance PPO’s robustness to real-world uncertainties.