🤖 AI Summary
This study addresses a critical safety gap in Advanced Driver Assistance Systems (ADAS) during takeover transitions, where real-world, takeover-centric multimodal data has been lacking. To bridge this gap, the authors introduce ADAS-TO, the first large-scale naturalistic driving dataset dedicated to ADAS takeovers, comprising 15,659 twenty-second takeover segments from 327 drivers, with synchronized front-view video and CAN logs. The dataset uniquely distinguishes between planned system disengagements (Ego) and forced takeovers (Non-ego). By integrating kinematic filtering with semantic annotations generated via vision-language models, the study reveals that actionable visual cues precede 59.3% of safety-critical takeovers by at least three seconds, demonstrating the potential for semantics-driven early warning systems. The dataset is publicly released to support further research.
📝 Abstract
Takeovers remain a key safety vulnerability in production ADAS, yet existing public resources rarely provide takeover-centered, real-world data. We present ADAS-TO, the first large-scale naturalistic dataset dedicated to ADAS-to-manual transitions, containing 15,659 takeover-centered 20s clips from 327 drivers across 22 vehicle brands. Each clip synchronizes front-view video with CAN logs. Takeovers are defined as ADAS ON $\rightarrow$ OFF transitions, with the primary trigger labeled as brake, steer, gas, mixed, or system disengagement. We further separate planned driver-initiated terminations (Ego) from forced takeovers (Non-ego) using a rule-based partition. While most events occur within conservative kinematic margins, we identify a long tail of 285 safety-critical cases. For these events, we combine kinematic screening with vision--language (VLM) annotation to attribute hazards and relate them to intervention dynamics. The resulting cross-modal analysis shows distinct kinematic signatures across traffic dynamics, infrastructure degradation, and adverse environments, and finds that in 59.3% of critical cases, actionable visual cues emerge at least 3s before takeover, supporting the potential for semantics-aware early warning beyond late-stage kinematic triggers. The dataset is publicly released at huggingface.co/datasets/HenryYHW/ADAS-TO-Sample.