🤖 AI Summary
This study addresses the high cost and environmental constraints of conventional eye-tracking by inferring real-time gaze trajectories solely from keystroke logs. We propose a deep neural network-based key-to-gaze mapping model. Its core contributions are: (1) user-specific low-dimensional parameter embedding to capture inter-individual variability in oculomotor behavior; (2) a key-gaze synchronized temporal alignment loss function that explicitly models the temporal coupling between keystrokes and fixations; and (3) a human-machine hybrid training paradigm jointly optimizing on synthetic keyboard interaction data and real gaze annotations. Evaluated on touchscreen typing—a highly noisy scenario with low temporal determinism—the model significantly outperforms existing baselines, improving scanpath reconstruction accuracy by 18.7%–32.4% (measured via DTW distance and saccade direction accuracy). This enables reliable, fine-grained gaze estimation without dedicated eye-tracking hardware.
📝 Abstract
We present a model for inferring where users look during interaction based on keypress data only. Given a key log, it outputs a scanpath that tells, moment-by-moment, how the user had moved eyes while entering those keys. The model can be used as a proxy for human data in cases where collecting real eye tracking data is expensive or impossible. Our technical insight is three-fold: first, we present an inference architecture that considers the individual characteristics of the user, inferred as a low-dimensional parameter vector; second, we present a novel loss function for synchronizing inferred eye movements with the keypresses; third, we train the model using a hybrid approach with both human data and synthetically generated data. The approach can be applied in interactive systems where predictive models of user behavior are available. We report results from evaluation in the challenging case of touchscreen typing, where the model accurately inferred real eye movements.