Efficient Eye Tracking Using Near-Eye Event Cameras: From Event-based Detection to Rapid Updates

Master Thesis (2025)
Author(s)

J. Liu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. A.N. Lan – Mentor (TU Delft - Embedded Systems)

Q. Wang – Mentor (TU Delft - Embedded Systems)

L. Du – Mentor (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-08-2025
Awarding Institution
Delft University of Technology
Project
['Thesis']
Programme
['Computer and Embedded Systems Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Eye tracking is a cornerstone technology for next-generation human-computer interaction, particularly in Extended Reality (XR), and other healthcare applications. However, traditional frame-based eye tracking systems are constrained by latency, power consumption, and motion blur. Event cameras offer a promising alternative with their high temporal resolution, high dynamic range and low data redundancy, but existing event-based methods often struggle to balance tracking accuracy, computational efficiency, and robustness, especially on resource-constrained mobile hardware.

This thesis addresses these challenges by proposing a novel, purely event-based eye tracking pipeline designed for high-frequency performance and robust accuracy within a strict computational budget. The pipeline accepts only event streams and estimates the pupil region in the field of view. The core contribution is a dual-state framework that synergistically combines a deep learning-based pupil detector with a lightweight, rapid template updater. For robust detection, a lightweight, attention-augmented segmentation network, named PupilUNet, is developed. It leverages a truncated MobileNetV3 Small encoder and a parameter-free attention mechanism to accurately segment the pupil boundary from Speed-Invariant Time Surface (SITS) representations, which provide a stable input by normalizing for motion speed. To overcome the scarcity of annotated data, a comprehensive framework is introduced to augment a large-scale training dataset from limited initial labels. Once a high-confidence pupil template is detected, the system transitions to a rapid updating mode, employing an optimized, vectorized point-to-edge matching algorithm to track the pupil at
kilo-Hertz frequencies with millisecond latency. A dynamic control logic monitors tracking quality and seamlessly reverts to the robust detection mode when necessary, ensuring both speed and resilience.

Experimental results on the EV-Eye dataset validate the pipeline’s effectiveness. The PupilUNet detector achieves a P5 accuracy of 96.3% (pupil center error < 5 pixels), while the rapid updater operates with an average latency of approximately 1 ms. The lightweight PupilUNet model contains merely 0.177 M parameters and inferences within 0.553 GFLOPs. The fully integrated system sustains a P5 accuracy of 85.2% while achieving a peak tracking frequency of over 960 Hz. This work demonstrates a practical and efficient solution that successfully navigates the trade-offs between accuracy and latency, establishing a new baseline for high-performance, event-based eye tracking on mobile and embedded systems.

Files

TUD_MSc_Thesis_J.Liu.pdf
(pdf | 2.33 Mb)
License info not available