Efficient Eye Tracking Using Near-Eye Event Cameras: From Event-based Detection to Rapid Updates
J. Liu (TU Delft - Electrical Engineering, Mathematics and Computer Science)
L. A.N. Lan – Mentor (TU Delft - Embedded Systems)
Q. Wang – Mentor (TU Delft - Embedded Systems)
L. Du – Mentor (TU Delft - Embedded Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Eye tracking is a cornerstone technology for next-generation human-computer interaction, particularly in Extended Reality (XR), and other healthcare applications. However, traditional frame-based eye tracking systems are constrained by latency, power consumption, and motion blur. Event cameras offer a promising alternative with their high temporal resolution, high dynamic range and low data redundancy, but existing event-based methods often struggle to balance tracking accuracy, computational efficiency, and robustness, especially on resource-constrained mobile hardware.
This thesis addresses these challenges by proposing a novel, purely event-based eye tracking pipeline designed for high-frequency performance and robust accuracy within a strict computational budget. The pipeline accepts only event streams and estimates the pupil region in the field of view. The core contribution is a dual-state framework that synergistically combines a deep learning-based pupil detector with a lightweight, rapid template updater. For robust detection, a lightweight, attention-augmented segmentation network, named PupilUNet, is developed. It leverages a truncated MobileNetV3 Small encoder and a parameter-free attention mechanism to accurately segment the pupil boundary from Speed-Invariant Time Surface (SITS) representations, which provide a stable input by normalizing for motion speed. To overcome the scarcity of annotated data, a comprehensive framework is introduced to augment a large-scale training dataset from limited initial labels. Once a high-confidence pupil template is detected, the system transitions to a rapid updating mode, employing an optimized, vectorized point-to-edge matching algorithm to track the pupil at
kilo-Hertz frequencies with millisecond latency. A dynamic control logic monitors tracking quality and seamlessly reverts to the robust detection mode when necessary, ensuring both speed and resilience.
Experimental results on the EV-Eye dataset validate the pipeline’s effectiveness. The PupilUNet detector achieves a P5 accuracy of 96.3% (pupil center error < 5 pixels), while the rapid updater operates with an average latency of approximately 1 ms. The lightweight PupilUNet model contains merely 0.177 M parameters and inferences within 0.553 GFLOPs. The fully integrated system sustains a P5 accuracy of 85.2% while achieving a peak tracking frequency of over 960 Hz. This work demonstrates a practical and efficient solution that successfully navigates the trade-offs between accuracy and latency, establishing a new baseline for high-performance, event-based eye tracking on mobile and embedded systems.