🤖 AI Summary
To address insufficient pedestrian trajectory prediction accuracy in autonomous driving, this paper proposes a lightweight Graph-Aware Transformer (GAT) model. Methodologically: (i) we introduce a novel agent-scene-aware embedding mechanism that jointly models scene context, spatial dynamics, social interactions, and temporal evolution; (ii) we design a weighted penalty loss function to prioritize short-term prediction accuracy and mitigate error accumulation; and (iii) the model supports cross-view generalization between bird’s-eye view (BEV) and ego-vehicle view (EVV). The architecture comprises a U-Net-based feature extractor, a graph-aware Transformer encoder, and a conditional variational autoencoder (CVAE) decoder, enabling both deterministic and stochastic predictions. On ETH-UCY, our method reduces average displacement error (ADE) and final displacement error (FDE) by 27% and 10%, respectively; on PIE, ADE improves by 26%. With only 1/7 the parameters of state-of-the-art models, it achieves significant gains in computational efficiency and cross-dataset generalization.
📝 Abstract
We present ASTRA (A} Scene-aware TRAnsformer-based model for trajectory prediction), a light-weight pedestrian trajectory forecasting model that integrates the scene context, spatial dynamics, social inter-agent interactions and temporal progressions for precise forecasting. We utilised a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a graph-aware transformer encoder for capturing social interactions. These components are integrated to learn an agent-scene aware embedding, enabling the model to learn spatial dynamics and forecast the future trajectory of pedestrians. The model is designed to produce both deterministic and stochastic outcomes, with the stochastic predictions being generated by incorporating a Conditional Variational Auto-Encoder (CVAE). ASTRA also proposes a simple yet effective weighted penalty loss function, which helps to yield predictions that outperform a wide array of state-of-the-art deterministic and generative models. ASTRA demonstrates an average improvement of 27%/10% in deterministic/stochastic settings on the ETH-UCY dataset, and 26% improvement on the PIE dataset, respectively, along with seven times fewer parameters than the existing state-of-the-art model (see Figure 1). Additionally, the model's versatility allows it to generalize across different perspectives, such as Bird's Eye View (BEV) and Ego-Vehicle View (EVV).