DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

📅 2025-06-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of safety-critical benchmarks and fine-grained intent reasoning for short-term motion prediction of vulnerable road users (VRUs) in urban environments, this paper introduces DRAMA-X—the first driving-safety-oriented multi-task benchmark. It comprises 5,686 high-risk frames and supports joint evaluation of object detection, nine-class directional intent prediction, binary risk assessment, and ego-vehicle action recommendation. We propose a novel “intent–risk–action” structured safety reasoning paradigm and present SGG-Intent, a training-free framework that integrates vision-language models (VLMs) for scene graph generation with large language models (LLMs) for causal reasoning. Through automated fine-grained annotation and systematic VLM-based evaluation, we demonstrate that scene graph modeling significantly improves intent and risk prediction accuracy. Moreover, we provide the first empirical characterization of VLMs’ capabilities and limitations in safety-critical reasoning.

Technology Category

Application Category

📝 Abstract
Understanding the short-term motion of vulnerable road users (VRUs) like pedestrians and cyclists is critical for safe autonomous driving, especially in urban scenarios with ambiguous or high-risk behaviors. While vision-language models (VLMs) have enabled open-vocabulary perception, their utility for fine-grained intent reasoning remains underexplored. Notably, no existing benchmark evaluates multi-class intent prediction in safety-critical situations, To address this gap, we introduce DRAMA-X, a fine-grained benchmark constructed from the DRAMA dataset via an automated annotation pipeline. DRAMA-X contains 5,686 accident-prone frames labeled with object bounding boxes, a nine-class directional intent taxonomy, binary risk scores, expert-generated action suggestions for the ego vehicle, and descriptive motion summaries. These annotations enable a structured evaluation of four interrelated tasks central to autonomous decision-making: object detection, intent prediction, risk assessment, and action suggestion. As a reference baseline, we propose SGG-Intent, a lightweight, training-free framework that mirrors the ego vehicle's reasoning pipeline. It sequentially generates a scene graph from visual input using VLM-backed detectors, infers intent, assesses risk, and recommends an action using a compositional reasoning stage powered by a large language model. We evaluate a range of recent VLMs, comparing performance across all four DRAMA-X tasks. Our experiments demonstrate that scene-graph-based reasoning enhances intent prediction and risk assessment, especially when contextual cues are explicitly modeled.
Problem

Research questions and friction points this paper is trying to address.

Predicting fine-grained intent of vulnerable road users
Assessing risk in safety-critical driving scenarios
Enabling autonomous decision-making via multi-task evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated annotation pipeline for fine-grained benchmark
Training-free SGG-Intent framework with VLM-backed detectors
Scene-graph-based reasoning enhances intent prediction
🔎 Similar Papers
No similar papers found.