REACT: A Real-Time Edge-AI Based V2X Framework for Accident Avoidance in Autonomous Driving System

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing V2X cooperative perception methods for autonomous driving suffer from poor generalizability, shallow contextual reasoning, and over-reliance on single-modal inputs, failing to adequately address multi-vehicle collisions caused by human error. Meanwhile, vision-language models (VLMs) struggle to balance real-time performance with safety-critical reliability. Method: This paper proposes a lightweight vision–language-driven V2X cooperative perception and trajectory optimization framework. It introduces a novel language-guided contextual reasoning mechanism, integrating multimodal sensor inputs, risk-aware trajectory planning, and edge-deployment optimization. Contribution/Results: The framework achieves unified semantic-level scene understanding and end-to-end real-time decision-making. After fine-tuning for the Jetson AGX Orin platform, it reduces collision rate by 77%, improves Vehicle Perception Quality (VPQ) by 48.2%, and achieves an inference latency of only 0.57 seconds on the DeepAccident benchmark—setting a new state-of-the-art for this task.

Technology Category

Application Category

📝 Abstract
Collisions caused by human error are the most common type of multi-vehicle crash, highlighting the critical need for autonomous driving (AD) systems to leverage cooperative perception through Vehicle-to-Everything (V2X) communication. This capability extends situational awareness beyond the limitations of onboard sensors. However, current transformer-based V2X frameworks suffer from limited generalization, shallow contextual reasoning, and reliance on mono-modal inputs. Vision-Language Models (VLMs) offer enhanced reasoning and multimodal integration but typically fall short of real-time performance requirements in safety-critical applications. This paper presents REACT, a real-time, V2X-integrated trajectory optimization framework built upon a fine-tuned lightweight VLM. REACT integrates a set of specialized modules that process multimodal inputs into optimized, risk-aware trajectories. To ensure real-time performance on edge devices, REACT incorporates edge adaptation strategies that reduce model complexity and accelerate inference. Evaluated on the DeepAccident benchmark, REACT achieves state-of-the-art performance, a 77% collision rate reduction, a 48.2% Video Panoptic Quality (VPQ), and a 0.57-second inference latency on the Jetson AGX Orin. Ablation studies validate the contribution of each input, module, and edge adaptation strategy. These results demonstrate the feasibility of lightweight VLMs for real-time edge-based cooperative planning and showcase the potential of language-guided contextual reasoning to improve safety and responsiveness in autonomous driving.
Problem

Research questions and friction points this paper is trying to address.

Enhancing autonomous driving safety via real-time V2X communication
Overcoming limitations of current transformer-based V2X frameworks
Achieving real-time performance with lightweight Vision-Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight VLM for real-time edge processing
Multimodal input integration for risk-aware trajectories
Edge adaptation strategies to accelerate inference
Fengze Yang
Fengze Yang
University of Utah
Traffic SafetyAILarge Language ModelV2X
B
Bo Yu
Department of Civil & Environmental Engineering, University of Utah
Y
Yang Zhou
Zachry Department of Civil and Environmental Engineering, Texas A&M University
X
Xuewen Luo
Department of Civil & Environmental Engineering, University of Utah
Zhengzhong Tu
Zhengzhong Tu
Texas A&M University, Google Research, University of Texas at Austin
Agentic AITrustworthy AIEmbodied AI
C
Chenxi Liu
Department of Civil & Environmental Engineering, University of Utah