ADP3: Affordance-Guided Generalizable Visuomotor Policies through 3D Action Diffusion
K. Biju Nair (TU Delft - Mechanical Engineering)
J. Kober – Mentor (TU Delft - Learning & Autonomous Control)
Jeyhoon Maskani – Mentor (Neura Robotics GmbH)
Milad Malekzadeh – Mentor (Neura Robotics GmbH)
Chirag Raman – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
J.M. Prendergast – Graduation committee member (TU Delft - Human-Robot Interaction)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Recent progress in visual imitation learning has shown that diffusion models are a powerful tool for training robots to perform complex manipulation tasks. While 3D Diffusion Policy uses a point cloud representation to improve spatial reasoning and sample efficiency, it still struggles to generalize across novel objects and environments due to spurious correlations learned from irrelevant visual features. In this work, a novel approach, Affordance-guided 3D Diffusion Policy (ADP3) is introduced, which integrates task-relevant affordance cues into the policy’s point cloud input. By conditioning the policy on 3D affordance heatmaps instead of raw point clouds, the policy is biased to attend to task-relevant object regions. Using affordance heatmaps reduced the success rate drop to just 3% on unseen objects in 4 Meta-World tasks, compared to a 35% drop when using raw point clouds. ADP3 also demonstrates impressive performance in our real-world experiments, showing resilience to cluttered scenes and novel object orientations.