Channel Selection for Faster Deep Learning-based Gaze Estimation in the Frequency Domain

A frequency domain approach to reducing latency in deep learning gaze estimation

Bachelor thesis (2023)

Authors

T.J. Penning Electrical Engineering, Mathematics and Computer Science

Contributors

G. Lan Embedded Systems - (supervisor 1)

L. Du Embedded Systems - (supervisor 1)

X. Zhang Pattern Recognition and Bioinformatics - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Gaze Estimation Deep Learning Frequency Domain Channel Selection

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:cfb86971-f204-4aa8-8fd5-d37877c4043e

Published Date

28-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Gaze estimation is an important area of research used in a wide range of applications. However, existing models trained for gaze estimation often suffer from high computational costs. In this study, frequency domain channel selection techniques were explored to decrease these costs by reducing the size of the input data. The main research objective was to investigate the impact of channel selection on the latency and accuracy of frequency domain gaze estimation. Channel selection methods used in related research were adapted and applied to the domain of gaze estimation. The evaluation was conducted on two popular network architectures used in this field, namely the AlexNet and ResNet-18. Multiple channel selection models were designed for each architecture and compared to a traditional RGB approach with the same network structure. Experimental results showed significant speedups during training, calibration, and inference with marginal accuracy loss. The specific speedups that the top-performing models of both the architectures achieves were 3.3, 4.0, and 1.35 for the AlexNet, and 1.5, 1.7, and 1.35 for the ResNet-18. Accompanying these speedups the AlexNet model error only increased by 0.08 degrees compared to a traditional RGB approach, while the ResNet-18 model lost around 0.44 degrees. All the code used in this research is publicly available on GitHub (https://github.com/tpenning/DLFDFaceGazeEstimation).

Files

CSE3000_Final_Paper.pdf

(.pdf | 0.985 Mb)