PADriver: Towards Personalized Autonomous Driving

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of jointly optimizing user intent understanding, interpretable decision-making, and regulatory compliance in personalized autonomous driving. To this end, we propose PADriver, a closed-loop framework that integrates multimodal large language models (MLLMs) with streaming video perception and introduces, for the first time, an explicit hazard-level prediction mechanism to provide human-interpretable grounding for action decisions. We further construct PAD-Highway—the first closed-loop evaluation benchmark tailored for personalized driving—comprising 250 hours of high-quality annotated highway videos. Experiments demonstrate that PADriver consistently outperforms existing methods on PAD-Highway, enabling text-prompt-driven scene understanding, hazard assessment, and generation of diverse, user-adapted driving policies. It achieves significant improvements in traffic-rule compliance, safety, and alignment with individual user preferences.

Technology Category

Application Category

📝 Abstract
In this paper, we propose PADriver, a novel closed-loop framework for personalized autonomous driving (PAD). Built upon Multi-modal Large Language Model (MLLM), PADriver takes streaming frames and personalized textual prompts as inputs. It autoaggressively performs scene understanding, danger level estimation and action decision. The predicted danger level reflects the risk of the potential action and provides an explicit reference for the final action, which corresponds to the preset personalized prompt. Moreover, we construct a closed-loop benchmark named PAD-Highway based on Highway-Env simulator to comprehensively evaluate the decision performance under traffic rules. The dataset contains 250 hours videos with high-quality annotation to facilitate the development of PAD behavior analysis. Experimental results on the constructed benchmark show that PADriver outperforms state-of-the-art approaches on different evaluation metrics, and enables various driving modes.
Problem

Research questions and friction points this paper is trying to address.

Develops personalized autonomous driving using MLLM inputs
Evaluates decision performance under traffic rules via PAD-Highway
Outperforms state-of-the-art methods in diverse driving modes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Multi-modal Large Language Model (MLLM)
Integrates personalized textual prompts
Closed-loop benchmark PAD-Highway for evaluation
🔎 Similar Papers
No similar papers found.
G
Genghua Kou
Beijing Institute of Technology
Fan Jia
Fan Jia
Faculty of Chemistry and Biochemistry, Ruhr-University of Bochum
Organic Chemistry
W
Weixin Mao
Waseda University
Yingfei Liu
Yingfei Liu
Megvii Technology
Yucheng Zhao
Yucheng Zhao
MEGVII Technology
RobotLarge Language ModelVideo Generation
Z
Ziheng Zhang
Megvii Technology
Osamu Yoshie
Osamu Yoshie
waseda university
Tiancai Wang
Tiancai Wang
Dexmal
Computer VisionEmbodied AI
Y
Ying Li
Beijing Institute of Technology
X
Xiangyu Zhang
Megvii Technology