Visual context plays a key role in many computer vision tasks, and performance of eye/gaze-tracking methods also benefit from it. However, the size of contextual information (e.g. full face image) is very large w.r.t the primary input i.e. cropped image of the eye. This adds larg
...
Visual context plays a key role in many computer vision tasks, and performance of eye/gaze-tracking methods also benefit from it. However, the size of contextual information (e.g. full face image) is very large w.r.t the primary input i.e. cropped image of the eye. This adds large computational costs to the algorithm and makes it inefficient, severely lim- iting its utility in real-time applications. In this paper, we perform a (computational) cost vs benefit analysis of var- ious input types that include context, leaning towards an efficient gaze-tracking system. We further study the effect of an alternate ranking loss based training strategy. Finally, we demonstrate some practical calibration techniques that can convert gaze-vectors into points-on-screen, an impor- tant application that is often overlooked in literature. We examine how data-efficient these techniques are in terms of how well they utilise expensive calibration data.