DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving

📅 2025-05-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of tightly coupling situational understanding and real-time decision-making in autonomous driving, this paper proposes an LLM-driven modular multi-agent framework that fuses multimodal sensor data—including camera, LiDAR, GPS, and IMU streams. Perception, causal reasoning, urgency assessment, and decision-making are explicitly decoupled into specialized agents. A timestamp-driven event filtering mechanism and cross-modal collaborative reasoning further enable interpretable, traceable end-to-end driving inference. Crucially, this work pioneers deep integration of large language models (LLMs) into a multi-agent architecture to support structured semantic comprehension and dynamic causal inference. Evaluated on multiple challenging autonomous driving benchmarks, the framework achieves state-of-the-art performance in situational understanding accuracy, decision correctness, and response latency—significantly outperforming existing baselines across all three metrics.

Technology Category

Application Category

📝 Abstract
We introduce DriveAgent, a novel multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion to enhance situational understanding and decision-making. DriveAgent uniquely integrates diverse sensor modalities-including camera, LiDAR, GPS, and IMU-with LLM-driven analytical processes structured across specialized agents. The framework operates through a modular agent-based pipeline comprising four principal modules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle-level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized perception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments on challenging autonomous driving datasets demonstrate that DriveAgent is achieving superior performance on multiple metrics against baseline methods. These results validate the efficacy of the proposed LLM-driven multi-agent sensor fusion framework, underscoring its potential to substantially enhance the robustness and reliability of autonomous driving systems.
Problem

Research questions and friction points this paper is trying to address.

Enhancing autonomous driving decision-making with LLM and sensor fusion
Integrating multimodal sensors with LLM-driven analytical processes
Improving robustness of autonomous systems through modular agent design
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven multi-agent reasoning for autonomous driving
Multimodal sensor fusion with camera, LiDAR, GPS, IMU
Modular agent pipeline for interpretable decision-making
🔎 Similar Papers
No similar papers found.
X
Xinmeng Hou
Chang’an University, Xi’an, Shaanxi, China and Agency for Science, Technology and Research (A*STAR), Singapore
W
Wuqi Wang
Chang’an University, Xi’an, Shaanxi, China
L
Long Yang
Chang’an University, Xi’an, Shaanxi, China
H
Hao Lin
University of California, Davis, USA
Jinglun Feng
Jinglun Feng
Apple | The City College of New York
Novel View Synthesis3D VisionSLAMRoboticsLLM
H
Haigen Min
Chang’an University, Xi’an, Shaanxi, China
X
Xiangmo Zhao
Chang’an University, Xi’an, Shaanxi, China