Planning-aligned Token Compression for Long-Context Autonomous Driving

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the computational burden of long-horizon vision-action models in autonomous driving, where excessive contextual length leads to high computational costs, and existing heuristic token compression methods—decoupled from planning objectives—often discard critical decision-making information. To resolve this, the authors propose COMPACT-VA, a framework that explicitly aligns token compression with driving planning for the first time. It employs a conditional VQ-VAE to jointly encode historical observations and planning intent distilled from future trajectories, learning a bounded working memory representation that is co-optimized with an end-to-end policy. Evaluated in highly dynamic scenarios, COMPACT-VA improves success rates by over 6% (reaching 68.3%) while achieving a 3.3× speedup in inference and a 2.7× reduction in memory usage, effectively balancing efficiency and performance.

📝 Abstract

Monolithic vision-action models represent an emerging paradigm in autonomous driving. However, this architecture produces token sequences that quickly exceed real-time computational budgets when encoding extended temporal context for complex interactions. While approaches like linear transformers and external memory try to make the context lightweight, token compression is most compatible with the architecture as it requires no backbone modifications. Yet existing compression adopts rule-based heuristics like temporal decay, decoupled from planning, risking loss of decision-critical information. We propose COMPACT-VA, a planning-aligned working memory framework built on conditional VQ-VAE, compressing extended context into bounded representations. Compression is conditioned on both historical trajectory and a learned planning intent that the posterior encoder distills from future trajectories during training, while the prior encoder learns to predict it from compressed observations. The compressed memory, concatenated with the predicted latent, feeds the policy for end-to-end optimization, planning with retained decision-critical information. We evaluate on high-signal dynamic scenarios where historical context is most critical for behavior correctness (e.g., stop, yield, or proceed), and accordingly design behavioral metrics. Under comparable token budgets, we achieve $>$6% improvement (68.3%) on success rates with consistent gains across metrics. Ablations validate planning-aligned coupling effectiveness. Closed-loop evaluation confirms that COMPACT-VA maintained general driving performance with 3.3* speedup and 2.7* memory reduction over uncompressed processing.

Problem

Research questions and friction points this paper is trying to address.

token compression

autonomous driving

long-context

planning alignment

decision-critical information

Innovation

Methods, ideas, or system contributions that make the work stand out.

planning-aligned compression

conditional VQ-VAE

token compression