Channel Selection for Faster Deep Learning-based Gaze Estimation in the Frequency Domain

A frequency domain approach to reducing latency in deep learning gaze estimation

More Info
expand_more

Abstract

Gaze estimation is an important area of research used in a wide range of applications. However, existing models trained for gaze estimation often suffer from high computational costs. In this study, frequency domain channel selection techniques were explored to decrease these costs by reducing the size of the input data. The main research objective was to investigate the impact of channel selection on the latency and accuracy of frequency domain gaze estimation. Channel selection methods used in related research were adapted and applied to the domain of gaze estimation. The evaluation was conducted on two popular network architectures used in this field, namely the AlexNet and ResNet-18. Multiple channel selection models were designed for each architecture and compared to a traditional RGB approach with the same network structure. Experimental results showed significant speedups during training, calibration, and inference with marginal accuracy loss. The specific speedups that the top-performing models of both the architectures achieves were 3.3, 4.0, and 1.35 for the AlexNet, and 1.5, 1.7, and 1.35 for the ResNet-18. Accompanying these speedups the AlexNet model error only increased by 0.08 degrees compared to a traditional RGB approach, while the ResNet-18 model lost around 0.44 degrees. All the code used in this research is publicly available on GitHub (https://github.com/tpenning/DLFDFaceGazeEstimation).