Vision-based robotic grasping has become a widely adopted approach in both industrial and domestic settings, enabling robots to perceive and interact with the environments. While progresses have been made in grasping static or predictable objects, dynamic object grasping remains 
                                ...
                            
                         
                        
                        
                            Vision-based robotic grasping has become a widely adopted approach in both industrial and domestic settings, enabling robots to perceive and interact with the environments. While progresses have been made in grasping static or predictable objects, dynamic object grasping remains a challenging problem due to the need for reactive perception, motion prediction, and real-time control. Traditional systems often rely on conventional RGB cameras, which have low temporal resolution and motion blur, resulting in perception latency or missed fast movement. Moreover, many of the previous grasping approaches require precise object or robot models but they are either difficult to obtain or fail in unstructured or fast-changing environments. To address these challenges, Ev2Act-Grasp is proposed, an end- to-end event vision-based deep reinforcement learning grasping system, which directly maps event frames inputs to 3D Cartesian control actions, performing a tracking–grasping task. The system operates in a fully end-to-end manner—without prior object models, handcrafted features, or supervised perception modules—using a flexible eye-in-hand configuration to handle randomly moving spheres. Ev2Act-Grasp is evaluated in simulation using a Franka Panda robot and demonstrate that it achieves a 100% tracking success rate and a 66% grasping success rate in the clean scenario, while maintaining over 75% tracking success
in moderately cluttered environments. Furthermore, the system demonstrates zero-shot sim-to-real transfer, achieving successful grasping across clean, clustered, and low-light environments under various object motions.