🤖 AI Summary
This study addresses the insufficient modeling of spatial proximity and choice correlation in high-frequency pedestrian trajectory prediction. It formulates the next-step movement as a discretized grid-based choice problem driven by changes in speed and heading. The authors propose a Residual Logit (ResLogit) model that retains the interpretability of linear utility functions in classical discrete choice models while incorporating learnable residual terms to capture local dependencies among densely spaced alternatives. This approach circumvents the limitations of manually specified nested structures in Generalized Extreme Value (GEV) models. Experiments on naturalistic interaction datasets from nuScenes and Argoverse 2 demonstrate that ResLogit significantly outperforms multinomial logit and various GEV variants in goodness-of-fit, with prediction errors predominantly concentrated in adjacent grid cells—evidencing enhanced behavioral consistency and superior modeling of spatial correlations.
📝 Abstract
High frequency pedestrian motion forecasting when interacting with autonomous vehicles (AVs) can be enhanced through the use of behavioural frameworks, such as discrete choice models, that can explicitly account for correlation among similar movement alternatives. We formulate the pedestrian next step choice as a spatial discrete choice defined by a grid of speed adjustment and heading change. Using naturalistic pedestrian-AV encounters from nuScenes and Argoverse 2 (1 sec decision interval), we estimate a multinomial logit baseline and four spatial generalized extreme value (GEV) specifications (SCL, GSCL, SCNL, and GSCNL). We then compare them to a residual neural network logit (ResLogit) model that learns cross alternative effects while retaining an interpretable linear utility component. Across the evaluated data, spatial GEV structures yield only marginal improvements over multinomial logit, whereas ResLogit achieves a substantially better fit and produces behaviourally coherent errors concentrated among neighbouring grid cells. The results suggest that in dense, high frequency spatial choice sets, learning based residual corrections can capture proximity induced correlation more effectively than analyst specified GEV nesting structures, while maintaining interpretability.