RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing VideoQA datasets exhibit threefold biases—geographic, viewpoint (e.g., CCTV, handheld, drone), and expert-centric annotation paradigms—that hinder modeling of diverse, user-generated narratives prevalent in global social media. To address this, we introduce RoadSocial, the first multi-source VideoQA benchmark specifically designed for social-media road events. It spans 23 countries and encompasses 12 complex question-answering tasks. We propose a novel social-comment-driven semi-automatic annotation framework that synergistically leverages Text and Video LLMs to generate high-quality QA pairs. The benchmark comprises 13.2K videos, 260K QA pairs, and 674 fine-grained semantic labels. We conduct systematic evaluation across 18 state-of-the-art Video LLMs, demonstrating significant improvements in cross-regional and cross-viewpoint generalization, as well as semantic fidelity in road-event understanding.

Technology Category

Application Category

📝 Abstract

We introduce RoadSocial, a large-scale, diverse VideoQA dataset tailored for generic road event understanding from social media narratives. Unlike existing datasets limited by regional bias, viewpoint bias and expert-driven annotations, RoadSocial captures the global complexity of road events with varied geographies, camera viewpoints (CCTV, handheld, drones) and rich social discourse. Our scalable semi-automatic annotation framework leverages Text LLMs and Video LLMs to generate comprehensive question-answer pairs across 12 challenging QA tasks, pushing the boundaries of road event understanding. RoadSocial is derived from social media videos spanning 14M frames and 414K social comments, resulting in a dataset with 13.2K videos, 674 tags and 260K high-quality QA pairs. We evaluate 18 Video LLMs (open-source and proprietary, driving-specific and general-purpose) on our road event understanding benchmark. We also demonstrate RoadSocial's utility in improving road event understanding capabilities of general-purpose Video LLMs.

Problem

Research questions and friction points this paper is trying to address.

Diverse VideoQA dataset for road event understanding

Overcome regional and viewpoint biases in existing datasets

Semi-automatic annotation using Text and Video LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Text and Video LLMs for annotation

Diverse geographies and camera viewpoints

Semi-automatic scalable annotation framework

🔎 Similar Papers

No similar papers found.

Authors to Follow