Towards a Reward-Free Reinforcement Learning Framework for Vehicle Control

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Reinforcement learning (RL) for autonomous vehicle control suffers from reward engineering bias, while imitation learning relies on scarce, high-quality expert action data. Method: We propose a reward-free end-to-end RL framework that eliminates both explicit reward signals and expert action labels. It introduces a Target State Prediction Network (TSPN) and a Reward-Free State-Guided Policy Network (RFSGPN), optimizing the policy by minimizing state prediction error relative to target states—e.g., desired trajectory points—rather than reward or action supervision. Contribution/Results: This work establishes the first purely target-state-driven control paradigm, requiring only environmental observations and target-state supervision. It obviates reward shaping and expert demonstrations entirely. Evaluated on standard vehicle control benchmarks, our method achieves significantly improved sample efficiency and policy robustness, enabling effective and stable autonomous driving learning even in reward-absent settings.

Technology Category

Application Category

📝 Abstract

Reinforcement learning plays a crucial role in vehicle control by guiding agents to learn optimal control strategies through designing or learning appropriate reward signals. However, in vehicle control applications, rewards typically need to be manually designed while considering multiple implicit factors, which easily introduces human biases. Although imitation learning methods does not rely on explicit reward signals, they necessitate high-quality expert actions, which are often challenging to acquire. To address these issues, we propose a reward-free reinforcement learning framework (RFRLF). This framework directly learns the target states to optimize agent behavior through a target state prediction network (TSPN) and a reward-free state-guided policy network (RFSGPN), avoiding the dependence on manually designed reward signals. Specifically, the policy network is learned via minimizing the differences between the predicted state and the expert state. Experimental results demonstrate the effectiveness of the proposed RFRLF in controlling vehicle driving, showing its advantages in improving learning efficiency and adapting to reward-free environments.

Problem

Research questions and friction points this paper is trying to address.

Eliminates manual reward design in vehicle control

Reduces human bias in reinforcement learning

Improves learning efficiency without expert actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward-free reinforcement learning framework

Target state prediction network

Reward-free state-guided policy network

🔎 Similar Papers

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving