Scalable Offline Metrics for Autonomous Driving

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline evaluation metrics for autonomous driving exhibit low correlation with online performance and fail to anticipate cumulative errors in closed-loop operation. To address this, we propose a novel offline metric grounded in cognitive uncertainty modeling, integrating perception and planning within a unified framework. Our approach leverages large-scale closed-loop simulation and real-world testing on ground-truth-annotated datasets to systematically analyze evaluation biases across multiple existing metrics. Experimental results demonstrate that the proposed metric significantly improves detection of latent high-risk scenarios. It achieves over 13% higher correlation with online safety outcomes than conventional metrics in both simulation and real-world settings—with particularly pronounced gains in real-road testing—thereby validating its strong generalizability and engineering practicality.

Technology Category

Application Category

📝 Abstract
Real-World evaluation of perception-based planning models for robotic systems, such as autonomous vehicles, can be safely and inexpensively conducted offline, i.e., by computing model prediction error over a pre-collected validation dataset with ground-truth annotations. However, extrapolating from offline model performance to online settings remains a challenge. In these settings, seemingly minor errors can compound and result in test-time infractions or collisions. This relationship is understudied, particularly across diverse closed-loop metrics and complex urban maneuvers. In this work, we revisit this undervalued question in policy evaluation through an extensive set of experiments across diverse conditions and metrics. Based on analysis in simulation, we find an even worse correlation between offline and online settings than reported by prior studies, casting doubts on the validity of current evaluation practices and metrics for driving policies. Next, we bridge the gap between offline and online evaluation. We investigate an offline metric based on epistemic uncertainty, which aims to capture events that are likely to cause errors in closed-loop settings. The resulting metric achieves over 13% improvement in correlation compared to previous offline metrics. We further validate the generalization of our findings beyond the simulation environment in real-world settings, where even greater gains are observed.
Problem

Research questions and friction points this paper is trying to address.

Bridging offline and online evaluation gap for autonomous driving
Improving correlation between offline metrics and real-world performance
Developing uncertainty-based offline metrics to predict driving failures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using epistemic uncertainty for offline metrics
Improving correlation between offline and online evaluation
Validating metrics in both simulation and real-world
🔎 Similar Papers
No similar papers found.
A
Animikh Aich
College of Engineering, Boston University, Boston, MA 02215, USA
A
Adwait Kulkarni
College of Engineering, Boston University, Boston, MA 02215, USA
Eshed Ohn-Bar
Eshed Ohn-Bar
Assistant Professor, Boston University
intelligent systemscomputer visionaccessibilityhuman-machine interactionassistive technology