🤖 AI Summary
This paper addresses continuous-time stochastic optimal control problems with Poisson jumps over a finite horizon. We propose a dual-network cooperative framework grounded in the dynamic programming (DP) principle: a policy network parameterizes the optimal control law, while a value network approximates the viscosity solution of a decoupled Hamilton–Jacobi–Bellman (HJB) equation. Crucially, we embed the continuous-time DP principle directly into the deep learning loss function—bypassing both Markov chain discretization and PDE spatial grid discretization for the first time. The method accommodates non-Gaussian, high-dimensional (≥50D) jump-diffusion systems, overcoming the curse of dimensionality inherent in conventional numerical PDE methods. We validate our approach on multidimensional financial derivative hedging and inventory control tasks, demonstrating substantial improvements in policy accuracy and achieving two orders of magnitude higher computational efficiency compared to classical PDE-based solvers.
📝 Abstract
In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.