An Empirical Study on Auxiliary Task Joint-Training for Diffusion Policy

Master Thesis (2026)
Author(s)

Q. Luo (TU Delft - Mechanical Engineering)

Contributor(s)

C. Della Santina – Mentor (TU Delft - Mechanical Engineering)

Z. Li – Mentor (TU Delft - Mechanical Engineering)

J. Kober – Graduation committee member (TU Delft - Mechanical Engineering)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
30-06-2026
Awarding Institution
Delft University of Technology
Faculty
Mechanical Engineering
Downloads counter
15
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

While Diffusion Policy has emerged as a powerful framework for robotic manipulation due to its expressiveness in modeling complex action distributions, its deployment is heavily constrained by high demonstration collection costs. This study presents a systematic empirical investigation into whether joint-training with visual auxiliary tasks can enhance the sample efficiency of diffusion policies under single-task spatial generalization (i.e., variations in object orientations and initial locations). Restricting observation inputs to raw 2D images and low-dimensional robot proprioception, we incorporate four candidate auxiliary tasks: image reconstruction, active object mask extraction, keypoint prediction, and optical flow estimation. We evaluate them with a joint-training framework across two simulated manipulation tasks and one real-world robotic task, using varying amounts of demonstration data. Our empirical findings demonstrate that joint-training with auxiliary tasks indeed provides sample efficiency benefits, particularly in intermediate data regimes. However, we observe that in certain cases, optimization conflicts and gradient interference between auxiliary and primary tasks diminish these benefits, especially in data-starved or data-rich regimes under simulated settings.

Files

License info not available