🤖 AI Summary
Existing pedestrian trajectory prediction methods struggle to effectively integrate multi-source observational cues and often overlook the scale dependency of future motion, resulting in limited robustness in complex scenarios. To address these limitations, this work proposes MUSCLE-NET, which incorporates a Multi-scale Multi-modal Feature Extraction (MMFE) module and a Multi-scale Enhanced Hierarchical Prediction (MEHP) module to fuse heterogeneous modalities—such as bounding boxes, velocity, and pose—while introducing a scale-adaptive prediction mechanism and a directional cross-modal fusion strategy. These components dynamically select motion-scale-relevant cues to mitigate spatial drift. Evaluated on the JAAD and PIE benchmarks, the proposed method significantly outperforms current state-of-the-art approaches, demonstrating superior prediction accuracy and robustness.
📝 Abstract
Accurate pedestrian trajectory prediction is essential for safe navigation in autonomous driving and intelligent transportation systems. Despite substantial progress made by recent methods, most existing approaches are limited in fully exploiting diverse observations and often overlook the scale dependency of future motion, treating multiscale features uniformly regardless of underlying motion dynamics. This limits their robustness across diverse pedestrian behaviors. To address these challenges, we propose a Predicted-MUltiSCale-Aware Network (MUSCLE-NET) for Pedestrian Trajectory Forecasting that integrates complementary multimodal cues with scale-adaptive prediction mechanisms. The proposed framework is built upon a Multiscale Multimodal Feature Extraction (MMFE) module, which combines multiscale representation, modality-aware recalibration, and directional cross-modal fusion to construct semantically aligned representations from bounding boxes, velocities, and pose information. Building on these features, a Multiscale Enhanced Hierarchical Prediction (MEHP) module performs prediction-aware future-motion refinement via a probabilistic coarse predictor, scale-aligned fusion, and progressive refinement, adaptively selecting scale-relevant cues to mitigate spatial drift. Extensive experiments on the JAAD and PIE benchmarks demonstrate that the proposed MUSCLE-Net achieves competitive performance and consistent gains compared with state-of-the-art trajectory prediction methods.