VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work proposes a novel multi-scale Transformer architecture that integrates multimodal inputs—such as visual and motion cues—to enhance the accuracy of pedestrian crossing intention prediction for Level 3–4 autonomous driving. For the first time, a video Vision Transformer (ViT) is introduced to this task, leveraging its capacity to capture long-range spatiotemporal dependencies. The effectiveness of the proposed design is rigorously validated through systematic ablation studies. Evaluated on the JAAD dataset, the model achieves state-of-the-art performance, significantly outperforming existing methods across key metrics including Accuracy, AUC, and F1-score. These results demonstrate the potential of the approach to improve pedestrian interaction safety in high-level autonomous driving systems.

Technology Category

Application Category

📝 Abstract

Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and features should be taken into consideration to make the roads of tomorrow safer for everybody. We introduce a transformer / video vision transformer based algorithm of different sizes which uses different data modalities .We evaluated our algorithms on popular pedestrian behaviour dataset, JAAD, and have reached SOTA performance and passed the SOTA in metrics like Accuracy, AUC and F1-score. The advantages brought by different model design choices are investigated via extensive ablation studies.

Problem

Research questions and friction points this paper is trying to address.

Pedestrian Intention Prediction

Autonomous Driving

Behavior Analysis

Crossing Behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformer

Pedestrian Intention Prediction

Multimodal Fusion

Autonomous Driving