Exploring the Impact of PPO Cost Functions on Skill Mutation in Robotic Pouring Tasks
J. van Buuren (TU Delft - Mechanical Engineering)
L. Peternel – Mentor (TU Delft - Human-Robot Interaction)
Micah Prendergast – Graduation committee member (TU Delft - Human-Robot Interaction)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This thesis explores how deliberate modifications to reward function design in the reinforcement learning can induce skill mutations in robotic reinforcement learning, specifically within a precision pouring task. Using a simulated Franka Emika Panda robot in NVIDIA Isaac Lab, we evaluate 25 distinct reward configurations composed of weighted terms for effort, accuracy, and velocity. The resulting policies exhibit a wide range of behaviors—from fast and efficient pours to novel skills such as rim cleaning, mixing, and watering—demonstrating that small adjustments in reward structure can yield significant variations in learned strategies. Our analysis demonstrates that even small changes in reward structure can lead to significant shifts in policy behavior, facilitating both task-optimal and creative, potentially transferable strategies. To validate this concept, we implement it using the Proximal Policy Optimization (PPO) algorithm, showing that reward design alone—without altering the learning architecture—can drive meaningful skill diversification. This approach offers promising directions for adaptive control, transfer learning, and multi-objective optimization in robotic systems.