Channel Selection for Faster Deep Learning-based Gaze Estimation in the Frequency Domain

A frequency domain approach to reducing latency in deep learning gaze estimation

Bachelor Thesis (2023)
Author(s)

T.J. Penning (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G. Lan – Mentor (TU Delft - Embedded Systems)

L. Du – Mentor (TU Delft - Embedded Systems)

Xucong Zhang – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Thijs Penning
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Thijs Penning
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Gaze estimation is an important area of research used in a wide range of applications. However, existing models trained for gaze estimation often suffer from high computational costs. In this study, frequency domain channel selection techniques were explored to decrease these costs by reducing the size of the input data. The main research objective was to investigate the impact of channel selection on the latency and accuracy of frequency domain gaze estimation. Channel selection methods used in related research were adapted and applied to the domain of gaze estimation. The evaluation was conducted on two popular network architectures used in this field, namely the AlexNet and ResNet-18. Multiple channel selection models were designed for each architecture and compared to a traditional RGB approach with the same network structure. Experimental results showed significant speedups during training, calibration, and inference with marginal accuracy loss. The specific speedups that the top-performing models of both the architectures achieves were 3.3, 4.0, and 1.35 for the AlexNet, and 1.5, 1.7, and 1.35 for the ResNet-18. Accompanying these speedups the AlexNet model error only increased by 0.08 degrees compared to a traditional RGB approach, while the ResNet-18 model lost around 0.44 degrees. All the code used in this research is publicly available on GitHub (https://github.com/tpenning/DLFDFaceGazeEstimation).

Files

CSE3000_Final_Paper.pdf
(pdf | 0.985 Mb)
License info not available