๐ค AI Summary
This work addresses the susceptibility of vision-language-action (VLA) policies to trajectory deviations that lead to unrecoverable out-of-distribution states during task execution. To mitigate this issue, the authors propose the B2FF recovery framework, which pre-generates a set of familiar future-state milestones from the initial observation prior to task execution. Upon detecting a failure, the method selects a recoverable milestone as a fixed visual target to guide the policy back onto a familiar trajectory. Notably, this approach leverages imagined future states as a recovery interface without requiring fine-tuning of the underlying action generator. Integrating a vision-language model with future-state imagination and a recoverability-aware milestone selection mechanism, B2FF significantly improves the average success rate of baseline VLA policies on the failure-injected LIBERO benchmark from 56.3% to 74.0%.
๐ Abstract
Vision-language-action (VLA) policies can deviate from nominal trajectories during manipulation, even when tasks remain physically feasible. Recovering from these deviations is challenging, as they push the policy into unfamiliar state spaces where direct re-planning frequently destabilizes action sequences. We propose Back to the Familiar Future (B2FF), a recovery framework for foresight-driven VLAs that leverages future visual conditioning as a recovery interface. Before execution, the VLA generates a milestone bank of familiar future states conditioned on the clean initial observation. At recovery time, a recoverability-aware selector selects a recovery milestone from this bank and enforces it as a fixed visual goal. This enables the VLA to robustly map off-trajectory observations back to a familiar future. On failure-injected LIBERO, under controlled recovery timing aligned with the injected failure, B2FF increases the average success rate of a baseline VLA from 56.3% to 74.0%, demonstrating that pre-imagined milestones can guide recovery without fine-tuning the low-level action generator.