Quantum-Efficient Reinforcement Learning Solutions for Last-Mile On-Demand Delivery

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the large-scale Capacitated Pickup and Delivery Problem with Time Windows (CPDPTW), aiming to minimize total travel time in last-mile on-demand delivery. Methodologically, we propose a quantum-enhanced reinforcement learning framework that integrates a problem-specific Parameterized Quantum Circuit (PQC), combining problem-driven entanglement encoding and variational architecture, with Proximal Policy Optimization (PPO) augmented by Quantum Singular Value Transformation (QSVT) for policy optimization and quantum acceleration. Compared to classical baselines, our framework achieves significant improvements in solution quality and convergence speed—even on instances with over one thousand nodes—while remaining computationally feasible. To the best of our knowledge, this is the first approach that jointly models quantum circuit design, RL training dynamics, and realistic logistical constraints. Empirical evaluation demonstrates the practical quantum advantage of near-term noisy intermediate-scale quantum (NISQ) devices for combinatorial optimization in logistics.

Technology Category

Application Category

📝 Abstract
Quantum computation has demonstrated a promising alternative to solving the NP-hard combinatorial problems. Specifically, when it comes to optimization, classical approaches become intractable to account for large-scale solutions. Specifically, we investigate quantum computing to solve the large-scale Capacitated Pickup and Delivery Problem with Time Windows (CPDPTW). In this regard, a Reinforcement Learning (RL) framework augmented with a Parametrized Quantum Circuit (PQC) is designed to minimize the travel time in a realistic last-mile on-demand delivery. A novel problem-specific encoding quantum circuit with an entangling and variational layer is proposed. Moreover, Proximal Policy Optimization (PPO) and Quantum Singular Value Transformation (QSVT) are designed for comparison through numerical experiments, highlighting the superiority of the proposed method in terms of the scale of the solution and training complexity while incorporating the real-world constraints.
Problem

Research questions and friction points this paper is trying to address.

Solving large-scale CPDPTW using quantum computing
Minimizing travel time in last-mile delivery via RL
Comparing PPO and QSVT for quantum-enhanced optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum-augmented Reinforcement Learning for delivery optimization
Problem-specific quantum encoding with variational layers
PPO and QSVT comparison for scalable solutions