🤖 AI Summary
Existing video generation models struggle to produce immediate, coherent, and physically plausible responses to continuous, time-varying forces. This work proposes StreamForce, a unified causal framework that, for the first time, supports both local and global time-varying force inputs within a single model without requiring separate training for different force types. By integrating a unified force representation, a causal autoregressive architecture, and a distillation-based training strategy, StreamForce effectively balances force responsiveness with generation efficiency. Experimental results demonstrate that StreamForce achieves 16.6 frames per second on a single GPU while outperforming existing methods in both force adherence and motion realism.
📝 Abstract
We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video models that train separate models for different force types, assume fixed forces, or rely on non-causal processing, StreamForce is a causal and unified model that responds instantly and coherently to both local and global, time-varying forces. To achieve this, we design a unified force representation as a control signal and develop a distillation pipeline for force-controllable video generation. Our model combines autoregressive efficiency with force responsiveness, sustaining stable photometric and dynamic realism. StreamForce runs at up to 16.6 FPS on a single GPU, achieving state-of-the-art performance in both force adherence and motion realism. Project website: https://neu-vi.github.io/StreamForce/