🤖 AI Summary
Traffic accident severity prediction faces three major challenges: missing data, high-dimensional feature interdependence, and extreme class imbalance—particularly the scarcity of high-severity instances. Existing single-model or black-box prompting approaches suffer from poor generalizability and limited interpretability. This paper proposes a modular multi-agent collaborative reasoning framework that integrates a rule-based engine with a large language model (LLM) consensus mechanism. Leveraging plug-and-play traditional model components and fine-grained modular prompt engineering, the framework ensures semantically traceable decision-making. Evaluated on real-world datasets from the UK and US, it achieves near 90% accuracy—substantially outperforming conventional models and state-of-the-art prompting methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT). To our knowledge, this is the first approach for accident severity prediction that simultaneously delivers high accuracy, strong robustness, and rigorous interpretability, thereby redefining the performance frontier for this task.
📝 Abstract
Accident severity prediction plays a critical role in transportation safety systems but is a persistently difficult task due to incomplete data, strong feature dependencies, and severe class imbalance in which rare but high-severity cases are underrepresented and hard to detect. Existing methods often rely on monolithic models or black box prompting, which struggle to scale in noisy, real-world settings and offer limited interpretability. To address these challenges, we propose MARBLE a multiagent rule based LLM engine that decomposes the severity prediction task across a team of specialized reasoning agents, including an interchangeable ML-backed agent. Each agent focuses on a semantic subset of features (e.g., spatial, environmental, temporal), enabling scoped reasoning and modular prompting without the risk of prompt saturation. Predictions are coordinated through either rule-based or LLM-guided consensus mechanisms that account for class rarity and confidence dynamics. The system retains structured traces of agent-level reasoning and coordination outcomes, supporting in-depth interpretability and post-hoc performance diagnostics. Across both UK and US datasets, MARBLE consistently outperforms traditional machine learning classifiers and state-of-the-art (SOTA) prompt-based reasoning methods including Chain-of-Thought (CoT), Least-to-Most (L2M), and Tree-of-Thought (ToT) achieving nearly 90% accuracy where others plateau below 48%. This performance redefines the practical ceiling for accident severity classification under real world noise and extreme class imbalance. Our results position MARBLE as a generalizable and interpretable framework for reasoning under uncertainty in safety-critical applications.