nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the lack of effective supervision for commonsense reasoning, spatial relationships, and decision inference in long-tail scenarios within existing autonomous driving datasets. The authors introduce a large-scale, real-world driving dataset and benchmark specifically targeting long-tail situations, comprising 20,000 multimodal driving clips. For the first time in real-world data, they systematically incorporate three types of human-verified reasoning annotations—spatial, decision-based, and counterfactual. By integrating multi-camera inputs, LiDAR, high-definition maps, and object annotations, the framework enables joint training and evaluation of reasoning and planning through vision-language models (VLMs) and vision-language-action models (VLAs). Experiments demonstrate that VLMs fine-tuned on this dataset significantly improve driving-related question-answering accuracy, while VLAs trained with reasoning supervision enhance planning performance even when reasoning outputs are disabled.

📝 Abstract

Reasoning is essential for autonomous driving (AD) in long-tail scenarios, where vehicles must apply commonsense knowledge, understand spatial relations, infer agent interactions, and make safe decisions. However, existing AD datasets and benchmarks mainly target perception, prediction, or planning, and provide limited supervision for reasoning over realistic long-tail driving scenes. We introduce nuReasoning, a large-scale real-world dataset and benchmark for reasoning-centric AD. Following the lineage of nuScenes and nuPlan, nuReasoning advances real-world AD datasets and benchmarks toward reasoning in long-tail driving scenarios. The dataset contains 20,000 clips, each 20 seconds long, collected across multiple cities, with synchronized multi-camera images, LiDAR data, HD maps, object annotations, and human-verified reasoning annotations spanning Spatial Reasoning, Decision Reasoning, and Counterfactual Reasoning. Unlike prior datasets that focus primarily on visual question answering, nuReasoning supports both reasoning evaluation and planning evaluation, enabling a direct study of how reasoning supervision affects driving performance. Experiments show that fine-tuning VLMs on nuReasoning substantially improves driving-specific question answering, while incorporating reasoning supervision into VLA training improves planning performance even when textual reasoning outputs are disabled at inference time. These results establish nuReasoning as a foundation for evaluating and improving robust, interpretable, reasoning-driven AD systems in realistic long-tail settings.

Problem

Research questions and friction points this paper is trying to address.

reasoning

autonomous driving

long-tail scenarios

dataset

benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning-centric

long-tail scenarios

autonomous driving