🤖 AI Summary
This work addresses the challenge of effectively evaluating the safety relevance of entities to the ego vehicle in traffic scenes, a task inadequately supported by existing methods and conventional scene graphs that lack explicit modeling of hazard perception. We propose, for the first time, a safety-oriented scene graph generation task centered on the ego vehicle, which integrates visual features, semantic information, accident statistics, and depth cues to explicitly capture the influence mechanisms and relative positions of salient hazards. To this end, we enrich the Cityscapes dataset with relational annotations and develop an end-to-end generation framework that visually conveys hazard severity through color coding and annotations. Evaluated across ten subtasks spanning five dimensions, our approach significantly outperforms baseline models in ego-centric hazard perception and scene understanding.
📝 Abstract
Maintaining situational awareness in complex driving scenarios is challenging. It requires continuously prioritizing attention among extensive scene entities and understanding how prominent hazards might affect the ego vehicle. While existing studies excel at detecting specific semantic categories and visually salient regions, they lack the ability to assess safety-relevance. Meanwhile, the generic spatial predicates either for foreground objects only or for all scene entities modeled by existing scene graphs are inadequate for driving scenarios. To bridge this gap, we introduce a novel task, Traffic Scene Graph Generation, which captures traffic-specific relations between prominent hazards and the ego vehicle. We propose a novel framework that explicitly uses traffic accident data and depth cues to supplement visual features and semantic information for reasoning. The output traffic scene graphs provide intuitive guidelines that stress prominent hazards by color-coding their severity and notating their effect mechanism and relative location to the ego vehicle. We create relational annotations on Cityscapes dataset and evaluate our model on 10 tasks from 5 perspectives. The results in comparative experiments and ablation studies demonstrate our capacity in ego-centric reasoning for hazard-aware traffic scene understanding.