V2X-RECT: An Efficient V2X Trajectory Prediction Framework via Redundant Interaction Filtering and Tracking Error Correction

๐Ÿ“… 2025-11-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In dense traffic scenarios, V2X trajectory prediction faces challenges including frequent target identity switching, inconsistent cross-view association, redundant multi-source information interaction, and inefficient, vehicle-centric encoding that causes historical feature recomputation and poor real-time performance. To address these issues, this paper proposes an efficient and robust multi-source collaborative trajectory prediction framework. Key contributions include: (1) a multi-source identity matching and correction module integrated with traffic-signal-guided interaction filtering to mitigate misalignment and identity jumps; (2) local spatiotemporal coordinate encoding coupled with dynamic selection of key interacting vehicles, enabling historical feature reuse and parallel decoding; and (3) multi-view spatiotemporal relational modeling with traffic-signal trend encoding to enhance feature fusion efficiency. Evaluated on V2X-Seq and V2X-Traj benchmarks, the method achieves significant improvements in prediction accuracy, cross-density robustness, and real-time inference latency.

Technology Category

Application Category

๐Ÿ“ Abstract
V2X prediction can alleviate perception incompleteness caused by limited line of sight through fusing trajectory data from infrastructure and vehicles, which is crucial to traffic safety and efficiency. However, in dense traffic scenarios, frequent identity switching of targets hinders cross-view association and fusion. Meanwhile, multi-source information tends to generate redundant interactions during the encoding stage, and traditional vehicle-centric encoding leads to large amounts of repetitive historical trajectory feature encoding, degrading real-time inference performance. To address these challenges, we propose V2X-RECT, a trajectory prediction framework designed for high-density environments. It enhances data association consistency, reduces redundant interactions, and reuses historical information to enable more efficient and accurate prediction. Specifically, we design a multi-source identity matching and correction module that leverages multi-view spatiotemporal relationships to achieve stable and consistent target association, mitigating the adverse effects of mismatches on trajectory encoding and cross-view feature fusion. Then we introduce traffic signal-guided interaction module, encoding trend of traffic light changes as features and exploiting their role in constraining spatiotemporal passage rights to accurately filter key interacting vehicles, while capturing the dynamic impact of signal changes on interaction patterns. Furthermore, a local spatiotemporal coordinate encoding enables reusable features of historical trajectories and map, supporting parallel decoding and significantly improving inference efficiency. Extensive experimental results across V2X-Seq and V2X-Traj datasets demonstrate that our V2X-RECT achieves significant improvements compared to SOTA methods, while also enhancing robustness and inference efficiency across diverse traffic densities.
Problem

Research questions and friction points this paper is trying to address.

Addresses target identity switching issues in dense V2X environments
Reduces redundant interactions and repetitive trajectory encoding
Improves real-time inference efficiency while maintaining prediction accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-source identity matching for stable target association
Traffic signal-guided interaction to filter key vehicles
Local spatiotemporal encoding for reusable historical features
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xiangyan Kong
Communication Research Center, Harbin Institute of Technology, Harbin, 150001, China
X
Xuecheng Wu
School of Computer Science and Technology, Xiโ€™an Jiaotong University, Xiโ€™an, 710049, China
Xiongwei Zhao
Xiongwei Zhao
Ph.D Candidate, Harbin Institute of Technology
3D PerceptionWorld ModelLLMEmbodied AIAutonomous System
X
Xiaodong Li
Communication Research Center, Harbin Institute of Technology, Harbin, 150001, China
Y
Yunyun Shi
School of Computer Science and Technology, Xiโ€™an Jiaotong University, Xiโ€™an, 710049, China
G
Gang Wang
Communication Research Center, Harbin Institute of Technology, Harbin, 150001, China
Dingkang Yang
Dingkang Yang
ByteDance
Multimodal LearningGenerative AIEmbodied AI
Y
Yang Liu
College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, China
H
Hong Chen
College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, China
Yulong Gao
Yulong Gao
Communication Research Center, Harbin Institute of Technology, Harbin, 150001, China