🤖 AI Summary
This work addresses the singularity challenges arising in the continuous-time optimal stopping problem for American option pricing and its associated reflected backward stochastic differential equation (RBSDE). To overcome these difficulties, the authors propose an entropy-regularized penalty method that yields a smooth approximation of the optimal stopping problem, thereby enabling gradient-based optimization and enhancing policy exploration. By analyzing the asymptotic behavior as the penalty parameter tends to infinity, they establish—for the first time—a rigorous connection between this regularization scheme and a novel class of RBSDEs featuring logarithmic singularities in their generators. Theoretically, they prove well-posedness and convergence of the regularized problem and establish existence and uniqueness of solutions for this class of singular RBSDEs. Numerically, feasibility is demonstrated through a combination of policy iteration, least-squares Monte Carlo, and monotone limit arguments.
📝 Abstract
This paper extends our previous work in Chee et al. [9] to continuous-time optimal stopping problems, with a particular focus on American options within an exploratory framework. We pursue two main objectives. First, motivated by reinforcement learning applications, we introduce an entropy-regularized penalization scheme for continuous-time optimal stopping problems. The scheme is inspired by classical penalization techniques for reflected backward stochastic differential equations (RBSDEs) and provides a smooth approximation of the degenerate stopping rule inherent to the American option problem. This regularization promotes exploration, enables the use of gradient-based optimization methods, and leads naturally to policy improvement algorithms. We establish well-posedness and convergence properties of the scheme, and illustrate its numerical feasibility through low-dimensional experiments based on policy iteration and least-squares Monte Carlo methods. Second, from a theoretical perspective, we study the asymptotic limit of the entropy-regularized penalization as the penalization parameter tends to infinity. We show that the limiting value process solves a reflected BSDE with a logarithmically singular driver, and we prove existence and uniqueness of solutions to this new class of RBSDEs via a monotone limit argument. To the best of our knowledge, such equations have not previously been investigated in the literature