Ev2Act-Grasp: Event-to-action Deep Reinforcement Learning for Dynamic Object Grasping

None, None

Ev2Act-Grasp: Event-to-action Deep Reinforcement Learning for Dynamic Object Grasping

Master Thesis (2025)

Author(s)

W. Xia (TU Delft - Mechanical Engineering)

Contributor(s)

C. Della Santina – Mentor (TU Delft - Learning & Autonomous Control)

C. Zhang – Mentor (TU Delft - Learning & Autonomous Control)

M. Wiertlewski – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty

Mechanical Engineering

Deep reinforcement learning Moving object detection Event camera Robotics grasping

To reference this document use:

https://resolver.tudelft.nl/uuid:25d07876-8e8a-4596-a909-76c94a9099bb

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

18-09-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Vision-based robotic grasping has become a widely adopted approach in both industrial and domestic settings, enabling robots to perceive and interact with the environments. While progresses have been made in grasping static or predictable objects, dynamic object grasping remains a challenging problem due to the need for reactive perception, motion prediction, and real-time control. Traditional systems often rely on conventional RGB cameras, which have low temporal resolution and motion blur, resulting in perception latency or missed fast movement. Moreover, many of the previous grasping approaches require precise object or robot models but they are either difficult to obtain or fail in unstructured or fast-changing environments. To address these challenges, Ev2Act-Grasp is proposed, an end- to-end event vision-based deep reinforcement learning grasping system, which directly maps event frames inputs to 3D Cartesian control actions, performing a tracking–grasping task. The system operates in a fully end-to-end manner—without prior object models, handcrafted features, or supervised perception modules—using a flexible eye-in-hand configuration to handle randomly moving spheres. Ev2Act-Grasp is evaluated in simulation using a Franka Panda robot and demonstrate that it achieves a 100% tracking success rate and a 66% grasping success rate in the clean scenario, while maintaining over 75% tracking success
in moderately cluttered environments. Furthermore, the system demonstrates zero-shot sim-to-real transfer, achieving successful grasping across clean, clustered, and low-light environments under various object motions.

Files

Weilin_Xia_thesis.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 17-09-2026