Towards High Performance Low Complexity Calibration in Appearance Based Gaze Estimation

Appearance-based gaze estimation from RGB images provides relatively unconstrained gaze tracking from commonly available hardware. The accuracy of subject-independent models is limited partly by small intra-subject and large inter-subject variations in appearance, and partly by a latent subject-depe...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 1 vom: 15. Jan., Seite 1174-1188
1. Verfasser:	Chen, Zhaokang (VerfasserIn)
Weitere Verfasser:	Shi, Bertram E
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2023
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Beschreibung
Zusammenfassung:	Appearance-based gaze estimation from RGB images provides relatively unconstrained gaze tracking from commonly available hardware. The accuracy of subject-independent models is limited partly by small intra-subject and large inter-subject variations in appearance, and partly by a latent subject-dependent bias. To improve estimation accuracy, we have previously proposed a gaze decomposition method that decomposes the gaze angle into the sum of a subject-independent gaze estimate from the image and a subject-dependent bias. Estimating the bias from images outperforms previously proposed calibration algorithms, unless the amount of calibration data is prohibitively large. This paper extends that work with a more complete characterization of the interplay between the complexity of the calibration dataset and estimation accuracy. In particular, we analyze the effect of the number of gaze targets, the number of images used per gaze target and the number of head positions in calibration data using a new NISLGaze dataset, which is well suited for analyzing these effects as it includes more diversity in head positions and orientations for each subject than other datasets. A better understanding of these factors enables low complexity high performance calibration. Our results indicate that using only a single gaze target and single head position is sufficient to achieve high quality calibration. However, it is useful to include variability in head orientation as the subject is gazing at the target. Our proposed estimator based on these studies (GEDDNet) outperforms state-of-the-art methods by more than 6.3%. One of the surprising findings of our work is that the same estimator yields the best performance both with and without calibration. This is convenient, as the estimator works well "straight out of the box," but can be improved if needed by calibration. However, this seems to violate the conventional wisdom that train and test conditions must be matched. To better understand the reasons, we provide a new theoretical analysis that specifies the conditions under which this can be expected. The dataset is available at http://nislgaze.ust.hk. Source code is available at https://github.com/HKUST-NISL/GEDDnet
Beschreibung:	Date Completed 05.04.2023 Date Revised 05.04.2023 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2022.3148386