ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Traditional approaches decouple 3D object detection from trajectory prediction, hindering effective spatiotemporal dependency modeling and leading to error accumulation. This paper addresses vision-based 3D perception for autonomous driving by proposing the first streaming multi-task framework that jointly performs detection and trajectory prediction. We introduce the Prediction-Aware Detection Transformer (PADT), which explicitly incorporates future trajectory priors into the detection process. Further, we design a Streaming Prediction Transformer (SPT) that enables end-to-end, tracking-free temporal modeling via shared query memory across frames. To support long-horizon inference, we adopt a multi-hypothesis prediction memory queue. Evaluated on nuScenes, our method achieves 54.9% EPA (+9.3% absolute improvement), along with state-of-the-art mAP and minADE, significantly enhancing dynamic scene understanding.

Technology Category

Application Category

📝 Abstract

We introduce ForeSight, a novel joint detection and forecasting framework for vision-based 3D perception in autonomous vehicles. Traditional approaches treat detection and forecasting as separate sequential tasks, limiting their ability to leverage temporal cues. ForeSight addresses this limitation with a multi-task streaming and bidirectional learning approach, allowing detection and forecasting to share query memory and propagate information seamlessly. The forecast-aware detection transformer enhances spatial reasoning by integrating trajectory predictions from a multiple hypothesis forecast memory queue, while the streaming forecast transformer improves temporal consistency using past forecasts and refined detections. Unlike tracking-based methods, ForeSight eliminates the need for explicit object association, reducing error propagation with a tracking-free model that efficiently scales across multi-frame sequences. Experiments on the nuScenes dataset show that ForeSight achieves state-of-the-art performance, achieving an EPA of 54.9%, surpassing previous methods by 9.3%, while also attaining the best mAP and minADE among multi-view detection and forecasting models.

Problem

Research questions and friction points this paper is trying to address.

Joint object detection and trajectory forecasting for autonomous vehicles

Overcoming limitations of separate detection and forecasting tasks

Eliminating explicit object association in multi-view streaming

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task streaming bidirectional learning approach

Tracking-free model eliminating explicit object association

Multi-view streaming joint detection forecasting transformers

🔎 Similar Papers

No similar papers found.