Human-centric Reward Optimization for Reinforcement Learning-based Automated Driving using Large Language Models

๐Ÿ“… 2024-05-07
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the critical limitations of rigid, non-human-like behavior in reinforcement learning (RL) agents for autonomous driving, this paper proposes a human-centric reward optimization framework centered on large language models (LLMs). The framework pioneers deep integration of LLMs into the RL reward design loop: leveraging natural-language instructions and dynamically encoded environmental states to generate interpretable, editable, and adaptive human-like reward signals. It establishes a novel โ€œreward agent + reward shapingโ€ paradigm and systematically uncovers the causal influence of prompt engineering on driving policy formation. Evaluated in the CARLA simulator, our approach significantly enhances driving anthropomorphism (expert score +37%), safety, and precision (collision rate reduced by 42%; path-tracking error decreased by 29%). Code and datasets are publicly released to ensure reproducibility and validate generalizability.

Technology Category

Application Category

๐Ÿ“ Abstract
One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach that uses large language models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic environment descriptions are input into the LLM. The LLM then utilizes this information to assist in generating rewards, thereby steering the behavior of RL agents towards patterns that more closely resemble human driving. The experimental results demonstrate that this approach not only makes RL agents more anthropomorphic but also achieves better performance. Additionally, various strategies for reward-proxy and reward-shaping are investigated, revealing the significant impact of prompt design on shaping an AD vehicle's behavior. These findings offer a promising direction for the development of more advanced, human-like automated driving systems. Our experimental data and source code can be found here
Problem

Research questions and friction points this paper is trying to address.

Autonomous Vehicles
Safety and Comfort
Cost-effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Advanced Text Understanding Models
Reward Learning Optimization
Human-like Driving Behavior
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Ziqi Zhou
Faculty of Applied Science & Engineering, University of Toronto, Toronto, Canada
J
Jingyue Zhang
Faculty of Applied Science & Engineering, University of Toronto, Toronto, Canada
J
Jingyuan Zhang
Faculty of Applied Science & Engineering, University of Toronto, Toronto, Canada
Yangfan He
Yangfan He
University of Minnesota - Twin Cities
AI AgentReasoningAI AlignmentFoundation Models
Boyue Wang
Boyue Wang
Beijing University of Technology
Computer Vision
Tianyu Shi
Tianyu Shi
University of Toronto
Reinforcement learningIntelligent Transportation SystemLarge Language ModelsAILLM agent
Alaa Khamis
Alaa Khamis
IRC for Smart Mobility & Logistics, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia