🤖 AI Summary
To address insufficient pedestrian trajectory prediction accuracy and underutilization of pose information in autonomous driving, this paper proposes a stepwise goal-driven network integrating skeletal keypoints and joint angles. Methodologically, it introduces— for the first time—the fusion of human pose angular features with a target-oriented, stepwise modeling mechanism, augmented by horizontal temporal flipping to enhance few-shot generalization. Skeletal sequences are extracted via pose estimation, and inter-segment joint angles are computed; the SGNet architecture is enhanced to enable multimodal spatiotemporal feature fusion and end-to-end trajectory regression. Evaluated on the JAAD and PIE datasets, the method achieves state-of-the-art performance, significantly outperforming the original SGNet. It delivers substantial improvements in both trajectory prediction accuracy and timeliness of collision warning, demonstrating superior practical utility for safety-critical autonomous driving applications.
📝 Abstract
Predicting pedestrian trajectories is essential for autonomous driving systems, as it significantly enhances safety and supports informed decision-making. Accurate predictions enable the prevention of collisions, anticipation of crossing intent, and improved overall system efficiency. In this study, we present SGNetPose+, an enhancement of the SGNet architecture designed to integrate skeleton information or body segment angles with bounding boxes to predict pedestrian trajectories from video data to avoid hazards in autonomous driving. Skeleton information was extracted using a pose estimation model, and joint angles were computed based on the extracted joint data. We also apply temporal data augmentation by horizontally flipping video frames to increase the dataset size and improve performance. Our approach achieves state-of-the-art results on the JAAD and PIE datasets using pose data with the bounding boxes, outperforming the SGNet model. Code is available on Github: SGNetPose+.