C$^3$ache: Accelerating World Action Models with Cross Inference Chunk Cache

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high computational overhead of World Action Models (WAM) during task execution, which stems from repeated chunked inference and time-consuming denoising steps. While existing acceleration methods overlook redundancy across chunks, this study is the first to identify and exploit residual redundancy between adjacent inference chunks at the same denoising step. The authors propose a training-free, cross-chunk residual caching and reuse mechanism that achieves efficient acceleration without altering the model architecture. Implemented within the Fast-WAM framework, the method attains up to 2.5× end-to-end inference speedup on standard benchmarks while preserving near-perfect task success rates.

📝 Abstract

World Action Models (WAMs) generalize better than standard Vision-Language-Action (VLA) policies to novel motions and environments, because a video-modeling objective lets them learn from abundant unlabeled video rather than scarce labeled robot demonstrations. This generalization is computationally expensive. To complete a task, a WAM runs over multiple inference chunks, and each chunk requires a costly denoising process. Existing acceleration methods reduce this cost by caching and reusing computation within a single chunk's denoising trajectory. Our empirical analysis reveals a substantial source of redundancy they overlook: redundancy across chunks. When a robot executes a smooth behavior, the residuals computed at a given denoising step are strongly correlated from one chunk to the next. We introduce C$^3$ache, a training-free method that caches and reuses these residuals across inference chunks at the same denoising step. Experiments on benchmarks with a Fast-WAM backbone show that C$^3$ache achieves up to a $2.5\times$ speedup in total wall-clock inference time, with negligible degradation in task success rate.

Problem

Research questions and friction points this paper is trying to address.

World Action Models

inference acceleration

cross-chunk redundancy

denoising process

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-chunk caching

World Action Models

Inference acceleration