🤖 AI Summary
This work addresses the inefficiency of deep reinforcement learning in continuous control due to high exploration costs and the computational expense and strong dynamics assumptions of model-based approaches. The authors propose Hybrid Energy-Aware Reward Shaping (H-EARS), a novel method that integrates lightweight physical priors into model-free reinforcement learning for the first time. By decomposing potential energy functions and introducing energy-aware action regularization, H-EARS achieves functional disentanglement between task objectives and energy constraints. The approach requires only linear-complexity approximations and offers theoretical guarantees inspired by Lyapunov stability, along with bounded error analysis. Experiments across multiple benchmarks and vehicle simulations demonstrate that H-EARS significantly improves convergence speed, policy stability, and energy efficiency, confirming its industrial applicability even under extreme conditions.
📝 Abstract
Deep reinforcement learning excels in continuous control but often requires extensive exploration, while physics-based models demand complete equations and suffer cubic complexity. This study proposes Hybrid Energy-Aware Reward Shaping (H-EARS), unifying potential-based reward shaping with energy-aware action regularization. H-EARS constrains action magnitude while balancing task-specific and energy-based potentials via functional decomposition, achieving linear complexity O(n) by capturing dominant energy components without full dynamics. We establish a theoretical foundation including: (1) functional independence for separate task/energy optimization; (2) energy-based convergence acceleration; (3) convergence guarantees under function approximation; and (4) approximate potential error bounds. Lyapunov stability connections are analyzed as heuristic guides. Experiments across baselines show improved convergence, stability, and energy efficiency. Vehicle simulations validate applicability in safety-critical domains under extreme conditions. Results confirm that integrating lightweight physics priors enhances model-free RL without complete system models, enabling transfer from lab research to industrial applications.