VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis

๐Ÿ“… 2026-01-05
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes a novel multi-scale Transformer architecture that integrates multimodal inputsโ€”such as visual and motion cuesโ€”to enhance the accuracy of pedestrian crossing intention prediction for Level 3โ€“4 autonomous driving. For the first time, a video Vision Transformer (ViT) is introduced to this task, leveraging its capacity to capture long-range spatiotemporal dependencies. The effectiveness of the proposed design is rigorously validated through systematic ablation studies. Evaluated on the JAAD dataset, the model achieves state-of-the-art performance, significantly outperforming existing methods across key metrics including Accuracy, AUC, and F1-score. These results demonstrate the potential of the approach to improve pedestrian interaction safety in high-level autonomous driving systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and features should be taken into consideration to make the roads of tomorrow safer for everybody. We introduce a transformer / video vision transformer based algorithm of different sizes which uses different data modalities .We evaluated our algorithms on popular pedestrian behaviour dataset, JAAD, and have reached SOTA performance and passed the SOTA in metrics like Accuracy, AUC and F1-score. The advantages brought by different model design choices are investigated via extensive ablation studies.
Problem

Research questions and friction points this paper is trying to address.

Pedestrian Intention Prediction
Autonomous Driving
Behavior Analysis
Crossing Behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformer
Pedestrian Intention Prediction
Multimodal Fusion
Autonomous Driving
Ablation Study
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Aly R. Elkammar
German University in Cairo, Faculty of Media Engineering and Technology, Computer Science and Engineering Department, Cairo, Egypt
K
Karim M. Gamaleldin
German University in Cairo, Faculty of Media Engineering and Technology, Computer Science and Engineering Department, Cairo, Egypt
Catherine M. Elias
Catherine M. Elias
German University in Cairo
System ArchitectureCooperative SystemsIntelligent Transportation SystemsConnected and Automated Vehicles (CAVs)