🤖 AI Summary
High-dimensional continuous-time multi-asset portfolio optimization suffers from the “curse of dimensionality” due to coupling between state and asset dimensions, rendering conventional Hamilton–Jacobi–Bellman (HJB) PDE or dynamic programming (DP) approaches computationally intractable.
Method: We propose a direct policy optimization framework grounded in Pontryagin’s Maximum Principle (PMP). Our approach introduces a novel two-stage PMP-guided algorithm: (i) backward propagation-through-time (BPTT) for stable co-state estimation; and (ii) derivation of a near-optimal closed-form control law, bypassing high-dimensional HJB PDEs or DP recursion. Policies are parameterized via neural networks and embedded with state-dependent dynamics modeling.
Contribution/Results: The method scales effectively to complex settings—e.g., 50 assets and 10 state variables—achieving near-theoretical-optimal performance. Experiments demonstrate superior computational efficiency over existing PDE/DP and RL baselines, while retaining interpretability and scalability.
📝 Abstract
Solving large-scale, continuous-time portfolio optimization problems involving numerous assets and state-dependent dynamics has long been challenged by the curse of dimensionality. Traditional dynamic programming and PDE-based methods, while rigorous, typically become computationally intractable beyond a small number of state variables (often limited to ~3-6 in prior numerical studies). To overcome this critical barrier, we introduce the emph{Pontryagin-Guided Direct Policy Optimization} (PG-DPO) framework. PG-DPO leverages Pontryagin's Maximum Principle to directly guide neural network policies via backpropagation-through-time, naturally incorporating exogenous state processes without requiring dense state grids. Crucially, our computationally efficient ``Two-Stage'' variant exploits rapidly stabilizing costate estimates derived from BPTT, converting them into near-optimal closed-form Pontryagin controls after only a short warm-up, significantly reducing training overhead. This enables a breakthrough in scalability: numerical experiments demonstrate that PG-DPO successfully tackles problems with dimensions previously considered far out of reach, optimizing portfolios with up to 50 assets and 10 state variables. The framework delivers near-optimal policies, offering a practical and powerful alternative for high-dimensional continuous-time portfolio choice.