EFE: End-to-end Frame-to-Gaze Estimation

None, None; None, None; None, None; None, None; None, None

EFE: End-to-end Frame-to-Gaze Estimation

Conference Paper (2023)

Author(s)

Haldun Balim (ETH Zürich)

Seonwook Park (Lunit Inc.)

Xi Wang (ETH Zürich)

Xucong Zhang (TU Delft - Pattern Recognition and Bioinformatics)

Otmar Hilliges (ETH Zürich)

Research Group

Pattern Recognition and Bioinformatics

DOI related publication

https://doi.org/10.1109/CVPRW59228.2023.00269 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:3ee9d601-36ea-4e31-8878-524a58a6b0de

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Pattern Recognition and Bioinformatics

Pages (from-to)

2688-2697

ISBN (print)

979-8-3503-0250-9

ISBN (electronic)

979-8-3503-0249-3

Event

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2023-06-17 - 2023-06-24), Vancouver, Canada

Downloads counter

337

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Despite the recent development of learning-based gaze estimation methods, most methods require one or more eye or face region crops as inputs and produce a gaze direction vector as output. Cropping results in a higher resolution in the eye regions and having fewer confounding factors (such as clothing and hair) is believed to benefit the final model performance. However, this eye/face patch cropping process is expensive, erroneous, and implementation-specific for different methods. In this paper, we propose a frame-to-gaze network that directly predicts both 3D gaze origin and 3D gaze direction from the raw frame out of the camera without any face or eye cropping. Our method demonstrates that direct gaze regression from the raw downscaled frame, from FHD/HD to VGA/HVGA resolution, is possible despite the challenges of having very few pixels in the eye region. The proposed method achieves comparable results to state-of-the-art methods in Point-of-Gaze (PoG) estimation on three public gaze datasets: GazeCapture, MPIIFaceGaze, and EVE, and generalizes well to extreme camera view changes.

Files

EFE_End_to_end_Frame_to_Gaze_E... (pdf)

(pdf | 4 Mb)

- Embargo expired in 14-02-2024

License info not available