An Empirical Study on Auxiliary Task Joint-Training for Diffusion Policy
Q. Luo (TU Delft - Mechanical Engineering)
C. Della Santina – Mentor (TU Delft - Mechanical Engineering)
Z. Li – Mentor (TU Delft - Mechanical Engineering)
J. Kober – Graduation committee member (TU Delft - Mechanical Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
While Diffusion Policy has emerged as a powerful framework for robotic manipulation due to its expressiveness in modeling complex action distributions, its deployment is heavily constrained by high demonstration collection costs. This study presents a systematic empirical investigation into whether joint-training with visual auxiliary tasks can enhance the sample efficiency of diffusion policies under single-task spatial generalization (i.e., variations in object orientations and initial locations). Restricting observation inputs to raw 2D images and low-dimensional robot proprioception, we incorporate four candidate auxiliary tasks: image reconstruction, active object mask extraction, keypoint prediction, and optical flow estimation. We evaluate them with a joint-training framework across two simulated manipulation tasks and one real-world robotic task, using varying amounts of demonstration data. Our empirical findings demonstrate that joint-training with auxiliary tasks indeed provides sample efficiency benefits, particularly in intermediate data regimes. However, we observe that in certain cases, optimization conflicts and gradient interference between auxiliary and primary tasks diminish these benefits, especially in data-starved or data-rich regimes under simulated settings.