🤖 AI Summary
This work addresses the challenge of real-time, low-latency, and temporally consistent multi-agent trajectory prediction in continuous autonomous driving scenarios by proposing a lightweight streaming trajectory prediction method. The approach introduces an endpoint-aware modeling mechanism that leverages the endpoint of previously predicted trajectories as an anchor to guide scene encoding. By integrating temporal context propagation with a streaming inference architecture, it eliminates the need for multi-stage optimization or segmented decoding, thereby significantly enhancing temporal consistency and inference efficiency. Evaluated on the Argoverse 2 benchmark, the method achieves state-of-the-art performance in both single-agent and multi-agent settings for streaming trajectory prediction while substantially reducing computational overhead and inference latency.
📝 Abstract
Future trajectories of neighboring traffic agents have a significant influence on the path planning and decision-making of autonomous vehicles. While trajectory forecasting is a well-studied field, research mainly focuses on snapshot-based prediction, where each scenario is treated independently of its global temporal context. However, real-world autonomous driving systems need to operate in a continuous setting, requiring real-time processing of data streams with low latency and consistent predictions over successive timesteps. We leverage this continuous setting to propose a lightweight yet highly accurate streaming-based trajectory forecasting approach. We integrate valuable information from previous predictions with a novel endpoint-aware modeling scheme. Our temporal context propagation uses the trajectory endpoints of the previous forecasts as anchors to extract targeted scenario context encodings. Our approach efficiently guides its scene encoder to extract highly relevant context information without needing refinement iterations or segment-wise decoding. Our experiments highlight that our approach effectively relays information across consecutive timesteps. Unlike methods using multi-stage refinement processing, our approach significantly reduces inference latency, making it well-suited for real-world deployment. We achieve state-of-the-art streaming trajectory prediction results on the Argoverse~2 multi-agent and single-agent benchmarks, while requiring substantially fewer resources.