Revisiting Zeroth-Order Hessian Approximation: A Single-Step Policy Optimization Lens

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of high variance and low efficiency in estimating the Hessian matrix and its inverse in high-dimensional zeroth-order optimization. By reframing the problem from a single-step policy optimization perspective, it establishes a theoretical equivalence between the zeroth-order Hessian and the Hessian of a smoothed objective function, and introduces ZoVH—a unified low-variance estimation framework. ZoVH incorporates a theoretically optimal baseline to minimize estimator variance, employs a query reuse mechanism to enhance sample efficiency, and unifies several classical stochastic estimators as special cases of this baseline. The paper provides rigorous theoretical guarantees for estimator unbiasedness, baseline variance optimality, and algorithmic convergence. Empirical results demonstrate that ZoVH significantly outperforms existing methods in both Hessian estimation accuracy and optimization convergence speed.

📝 Abstract

Accurate Zeroth-Order (ZO) Hessian estimation is a cornerstone of derivative-free methods, essential for tasks such as bilevel optimization, Bayesian inference, and uncertainty quantification. However, obtaining a complete suite of low-variance estimators for the Hessian and its inverse in high-dimensional settings remains a significant challenge. To address this, we propose a unified framework that reinterprets ZO Hessian approximation through the lens of single-step Policy Optimization (PO). This perspective establishes a theoretical equivalence between general ZO Hessian estimators and the Hessian of a smoothed PO objective, unifying distinct classical randomized estimators as specific instances of baseline selection. Building on this foundation, we introduce ZoVH, a comprehensive suite of variance-reduced estimators for the full Hessian matrix, its regularized inverse, and the bias-corrected inverse Hessian-gradient product. ZoVH leverages two key techniques: (1) a unique optimal baseline derived to provably minimize variance, and (2) a query reuse strategy that incorporates historical function queries to enhance sample efficiency without inflating costs. Our rigorous theoretical analysis confirms the unbiasedness of the Hessian estimator, validates the variance optimality of our baseline, provides error bounds for the entire ZoVH suite, and establishes convergence guarantees for the resulting curvature-aware ZO algorithm. Extensive empirical results validate our theoretical findings, demonstrating that ZoVH achieves superior estimation accuracy and convergence performance in real-world applications. Code is available at https://github.com/Qjbtiger/ZoVH

Problem

Research questions and friction points this paper is trying to address.

Zeroth-Order

Hessian approximation

high-dimensional

variance reduction

derivative-free optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zeroth-Order Optimization

Hessian Approximation

Policy Optimization