Ev2Act-Grasp: Event-to-action Deep Reinforcement Learning for Dynamic Object Grasping
W. Xia (TU Delft - Mechanical Engineering)
C. Della Santina – Mentor (TU Delft - Learning & Autonomous Control)
C. Zhang – Mentor (TU Delft - Learning & Autonomous Control)
M. Wiertlewski – Graduation committee member (TU Delft - Human-Robot Interaction)
                                 More Info
                                
                                     expand_more
                                
                            
                            
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Vision-based robotic grasping has become a widely adopted approach in both industrial and domestic settings, enabling robots to perceive and interact with the environments. While progresses have been made in grasping static or predictable objects, dynamic object grasping remains a challenging problem due to the need for reactive perception, motion prediction, and real-time control. Traditional systems often rely on conventional RGB cameras, which have low temporal resolution and motion blur, resulting in perception latency or missed fast movement. Moreover, many of the previous grasping approaches require precise object or robot models but they are either difficult to obtain or fail in unstructured or fast-changing environments. To address these challenges, Ev2Act-Grasp is proposed, an end- to-end event vision-based deep reinforcement learning grasping system, which directly maps event frames inputs to 3D Cartesian control actions, performing a tracking–grasping task. The system operates in a fully end-to-end manner—without prior object models, handcrafted features, or supervised perception modules—using a flexible eye-in-hand configuration to handle randomly moving spheres. Ev2Act-Grasp is evaluated in simulation using a Franka Panda robot and demonstrate that it achieves a 100% tracking success rate and a 66% grasping success rate in the clean scenario, while maintaining over 75% tracking success
in moderately cluttered environments. Furthermore, the system demonstrates zero-shot sim-to-real transfer, achieving successful grasping across clean, clustered, and low-light environments under various object motions.
Files
File under embargo until 17-09-2026