๐ค AI Summary
This paper addresses the joint optimization of passenger matching, vehicle rebalancing, and charging allocation for large-scale electric autonomous taxi (robo-taxi) fleets operating under stochastic demand. Formulated as an average-reward infinite-horizon Markov decision process (MDP), the problem suffers from exponential growth of state and action spaces with fleet size. To tackle this, we propose an atomic action decomposition mechanism that drastically reduces the policy search space. Furthermore, we adapt the Proximal Policy Optimization (PPO) algorithmโfirst time for coordinated fleet-wide decision-making and dynamic charging resource allocation. Extensive simulations on real-world New York City ride-hailing data demonstrate that our method achieves long-term average revenue significantly closer to the fluid upper bound than baselines. Quantitative analysis reveals critical trade-offs: vehicle battery range and charger power capacity exert substantial influence on system throughput and deadheading rate.
๐ Abstract
Pioneering companies such as Waymo have deployed robo-taxi services in several U.S. cities. These robo-taxis are electric vehicles, and their operations require the joint optimization of ride matching, vehicle repositioning, and charging scheduling in a stochastic environment. We model the operations of the ride-hailing system with robo-taxis as a discrete-time, average reward Markov Decision Process with infinite horizon. As the fleet size grows, the dispatching is challenging as the set of system state and the fleet dispatching action set grow exponentially with the number of vehicles. To address this, we introduce a scalable deep reinforcement learning algorithm, called Atomic Proximal Policy Optimization (Atomic-PPO), that reduces the action space using atomic action decomposition. We evaluate our algorithm using real-world NYC for-hire vehicle data and we measure the performance using the long-run average reward achieved by the dispatching policy relative to a fluid-based reward upper bound. Our experiments demonstrate the superior performance of our Atomic-PPO compared to benchmarks. Furthermore, we conduct extensive numerical experiments to analyze the efficient allocation of charging facilities and assess the impact of vehicle range and charger speed on fleet performance.