End-to-end behavior cloning agent for an object handover task

None, None

End-to-end behavior cloning agent for an object handover task

Training and evaluating a robot to perform simulated human-to-robot object handovers without requiring hand-object segmentation

Master Thesis (2025)

Author(s)

Y. Watabe (TU Delft - Mechanical Engineering)

Contributor(s)

Yke B. B. Eisma – Mentor (TU Delft - Human-Robot Interaction)

Y.B. Eisma – Graduation committee member (TU Delft - Human-Robot Interaction)

D. Dodou – Graduation committee member (TU Delft - Medical Instruments & Bio-Inspired Technology)

R. Zhang – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty

Mechanical Engineering

Behavior Cloning Handover Visuomotor

To reference this document use:

https://resolver.tudelft.nl/uuid:38c94fbb-2654-4954-aba9-c04a31913dd1

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

28-04-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Current visuomotor manipulators jointly train their perception-planning-action components simultaneously using an end-to-end framework to avoid hand-engineering components. Despite this, methods for human-to-robot object handover tasks require a perception component that segments the hand from the object, which can introduce error propagation. For this reason, this study investigates the applicability of an end-to-end framework that eliminates the need for hand-object segmentation in a simulated human-to-robot object handover task using HandoverSim.

To address this, a behavior cloning agent is used to convert camera input into RGB-D voxel space and output discretized 6-DoF manipulation to directly discover features for the handover task. This study introduces a framework that combines the behavior cloning agent with the HandoverSim, which allows experimenting with various training configurations. These configurations consist of experiments with: 1) expert demonstration data; 2) optimal camera setup; 3) handover objects; and 4) voxel-based RGB augmentation techniques.

The trained model is evaluated on its generalization to diverse handover conditions in the HandoverSim Benchmark. The results demonstrate that the behavior cloning agent can learn features for the handover task without requiring a perception component. The model learns the grasp-object relation whilst minimizing contact with the hand. Despite this, performance is limited by sparse training data and grasping accuracy.

Files

Thesis_report_ywatabe_4669797.... (pdf)

(pdf | 25.9 Mb)

License info not available