Unleashing VLA Potentials in Autonomous Driving via Explicit Learning from Failures

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the limited exploration capability of Vision-Language-Action (VLA) models in reinforcement learning, which hinders root-cause diagnosis of failures in long-tail scenarios. To overcome this, the authors propose ELF-VLA, a novel framework that introduces an explicit failure learning mechanism for the first time. ELF-VLA replaces sparse scalar rewards with structured diagnostic feedback to generate interpretable failure-mode reports, which in turn guide policy refinement. Additionally, high-return trajectories are reintroduced into training batches to enable feedback-driven optimization. By moving beyond the conventional reliance on scalar rewards in RL, the method achieves state-of-the-art performance on the NAVSIM benchmark, significantly improving both PDMS and EPDMS metrics as well as high-level planning accuracy.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action (VLA) models for autonomous driving often hit a performance plateau during Reinforcement Learning (RL) optimization. This stagnation arises from exploration capabilities constrained by previous Supervised Fine-Tuning (SFT), leading to persistent failures in long-tail scenarios. In these critical situations, all explored actions yield a zero-value driving score. This information-sparse reward signals a failure, yet fails to identify its root cause -- whether it is due to incorrect planning, flawed reasoning, or poor trajectory execution. To address this limitation, we propose VLA with Explicit Learning from Failures (ELF-VLA), a framework that augments RL with structured diagnostic feedback. Instead of relying on a vague scalar reward, our method produces detailed, interpretable reports that identify the specific failure mode. The VLA policy then leverages this explicit feedback to generate a Feedback-Guided Refinement. By injecting these corrected, high-reward samples back into the RL training batch, our approach provides a targeted gradient, which enables the policy to solve critical scenarios that unguided exploration cannot. Extensive experiments demonstrate that our method unlocks the latent capabilities of VLA models, achieving state-of-the-art (SOTA) performance on the public NAVSIM benchmark for overall PDMS, EPDMS score and high-level planning accuracy.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action

Reinforcement Learning

Failure Diagnosis

Autonomous Driving

Long-tail Scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action

Explicit Learning from Failures

Reinforcement Learning