🤖 AI Summary
Addressing the challenge of tightly coupling situational understanding and real-time decision-making in autonomous driving, this paper proposes an LLM-driven modular multi-agent framework that fuses multimodal sensor data—including camera, LiDAR, GPS, and IMU streams. Perception, causal reasoning, urgency assessment, and decision-making are explicitly decoupled into specialized agents. A timestamp-driven event filtering mechanism and cross-modal collaborative reasoning further enable interpretable, traceable end-to-end driving inference. Crucially, this work pioneers deep integration of large language models (LLMs) into a multi-agent architecture to support structured semantic comprehension and dynamic causal inference. Evaluated on multiple challenging autonomous driving benchmarks, the framework achieves state-of-the-art performance in situational understanding accuracy, decision correctness, and response latency—significantly outperforming existing baselines across all three metrics.
📝 Abstract
We introduce DriveAgent, a novel multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion to enhance situational understanding and decision-making. DriveAgent uniquely integrates diverse sensor modalities-including camera, LiDAR, GPS, and IMU-with LLM-driven analytical processes structured across specialized agents. The framework operates through a modular agent-based pipeline comprising four principal modules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle-level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized perception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments on challenging autonomous driving datasets demonstrate that DriveAgent is achieving superior performance on multiple metrics against baseline methods. These results validate the efficacy of the proposed LLM-driven multi-agent sensor fusion framework, underscoring its potential to substantially enhance the robustness and reliability of autonomous driving systems.