🤖 AI Summary
This paper addresses the challenge of globally optimizing replication portfolios under non-convex constraints—such as non-convex transaction costs and capital limits—in incomplete markets. We propose the first AlphaZero-inspired framework for financial hedging, departing from conventional deep hedging approaches that rely on convexity assumptions. Our method models hedging as a sequential game between an investor and the market, integrating Monte Carlo tree search with deep reinforcement learning, while establishing theoretical connections to convex optimization. Empirical results demonstrate that the approach consistently converges to near-optimal policies across diverse non-convex settings, achieves significantly higher sample efficiency than gradient-based baselines, and exhibits strong generalization with minimal overfitting risk. By enabling scalable, adaptive hedging under realistic, non-convex market frictions, our framework establishes a novel, theoretically grounded paradigm for dynamic hedging in complex, incomplete markets.
📝 Abstract
This paper examines replication portfolio construction in incomplete markets - a key problem in financial engineering with applications in pricing, hedging, balance sheet management, and energy storage planning. We model this as a two-player game between an investor and the market, where the investor makes strategic bets on future states while the market reveals outcomes. Inspired by the success of Monte Carlo Tree Search in stochastic games, we introduce an AlphaZero-based system and compare its performance to deep hedging - a widely used industry method based on gradient descent. Through theoretical analysis and experiments, we show that deep hedging struggles in environments where the $Q$-function is not subject to convexity constraints - such as those involving non-convex transaction costs, capital constraints, or regulatory limitations - converging to local optima. We construct specific market environments to highlight these limitations and demonstrate that AlphaZero consistently finds near-optimal replication strategies. On the theoretical side, we establish a connection between deep hedging and convex optimization, suggesting that its effectiveness is contingent on convexity assumptions. Our experiments further suggest that AlphaZero is more sample-efficient - an important advantage in data-scarce, overfitting-prone derivative markets.