ADP3: Affordance-Guided Generalizable Visuomotor Policies through 3D Action Diffusion

None, None

ADP3: Affordance-Guided Generalizable Visuomotor Policies through 3D Action Diffusion

Master Thesis (2025)

Author(s)

K. Biju Nair (TU Delft - Mechanical Engineering)

Contributor(s)

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

Jeyhoon Maskani – Mentor (Neura Robotics GmbH)

Milad Malekzadeh – Mentor (Neura Robotics GmbH)

C.A. Raman – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

J.M. Prendergast – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty

Mechanical Engineering

Visual Imitation Learning Pretrained Visual Representations Generalization in Manipulation

To reference this document use:

https://resolver.tudelft.nl/uuid:56447a22-fbf3-4d12-92e2-444428aeb5e8

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

08-08-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Abstract

Recent progress in visual imitation learning has shown that diffusion models are a powerful tool for training robots to perform complex manipulation tasks. While 3D Diffusion Policy uses a point cloud representation to improve spatial reasoning and sample efficiency, it still struggles to generalize across novel objects and environments due to spurious correlations learned from irrelevant visual features. In this work, a novel approach, Affordance-guided 3D Diffusion Policy (ADP3) is introduced, which integrates task-relevant affordance cues into the policy’s point cloud input. By conditioning the policy on 3D affordance heatmaps instead of raw point clouds, the policy is biased to attend to task-relevant object regions. Using affordance heatmaps reduced the success rate drop to just 3% on unseen objects in 4 Meta-World tasks, compared to a 35% drop when using raw point clouds. ADP3 also demonstrates impressive performance in our real-world experiments, showing resilience to cluttered scenes and novel object orientations.

Files

Karthik-thesis-report.pdf

(pdf | 2.12 Mb)

License info not available