GazeHELL: Gaze Estimation with Hybrid Encoders and Localised Losses with weighing


Shubham Dokania (Mercedes-Benz R&D India), Vasudev Singh (Mercedes Benz Research & Development India), Shuaib Ahmed (Mercedes Benz R&D India )
The 35th British Machine Vision Conference

Abstract

In the pursuit of robust eye gaze estimation, traditional approaches often grapple with the limitations of either spatial granularity or model interpretability. This paper introduces a dual-architecture framework that synergizes the strengths of Vision Transformers (ViT) and convolutional networks to enhance gaze estimation accuracy and reliability. We also propose two novel loss functions to refine our predictions: (1) a differentiable heatmap-based 2D MSE loss that transforms gaze vectors into a spatial heatmap enhancing the model’s ability to localize gaze with high precision, and (2) a Fourier encoding loss that leverages high-dimensional Fourier features to capture complex spatial relationships more effectively. Additionally, we incorporate auxiliary uncertainty-based task weighing into our losses to provide a measure of confidence alongside gaze estimates, aiming to improve predictions dynamically during training. Our experimental results on MPIIGaze and RT-GENE datasets demonstrate significant improvements over existing methods and establishes a new state-of-the-art benchmark on both, with upto 3% improvement in the respective datasets. This work not only advances the field of eye gaze estimation but also opens new avenues for applying advanced vision techniques in human-computer interaction and beyond.

Citation

@inproceedings{Dokania_2024_BMVC,
author    = {Shubham Dokania and Vasudev Singh and Shuaib Ahmed},
title     = {GazeHELL: Gaze Estimation with Hybrid Encoders and Localised Losses with weighing},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0927.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection