Curvature-Aligned Probing for Local Loss-Landscape Stabilization

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
Existing measures of local loss landscape stability suffer from low information efficiency due to a mismatch between probing directions and the highly anisotropic structure of neural networks. This work reframes stability as an observational problem and introduces a curvature-aligned probing criterion, denoted Δ₂^(D), based on alignment with the top-D eigensubspace of the empirical Hessian, leveraging a local quadratic model to characterize the loss increment field. We construct a unified family of probing criteria and prove that Δ₂^(D) preserves the O(k⁻²) mean-square convergence rate while reducing dimensionality dependence from the full parameter space to the subspace dimension D. A closed-form spectral expression and extremal properties of Δ₂^(D) are derived. Efficient and scalable computation is achieved via Hessian-vector products, subspace Monte Carlo sampling, and Gaussian moment proxy estimators. Experiments demonstrate that probing within the curvature-aligned subspace—occupying only a tiny fraction of the full parameter space—yields signals numerically indistinguishable from full-space results, with the closed-form estimator outperforming direct Monte Carlo by several orders of magnitude.

Technology Category

Application Category

📝 Abstract
Local loss-landscape stabilization under sample growth is typically measured either pointwise or through isotropic averaging in the full parameter space. Despite practical value, both choices probe directions that contribute little to the dominant local deformation of strongly anisotropic neural landscapes. We recast stabilization as an observational problem and introduce a unified family of criteria parameterized by an aggregation order and a probing distribution; within this family we propose a curvature-aligned criterion $Δ_2^{(D)}$ that probes the loss increment field in the top-$D$ eigenspace of the empirical Hessian near a trained solution. Solely from a local quadratic model, we prove that $Δ_2^{(D)}$ preserves the $O(k^{-2})$ mean-squared rate of the full-space criterion while replacing ambient-dimension curvature dependence with dependence on the subspace dimension $D$; a corollary gives a closed-form spectral expression and a proposition identifies the top-$D$ eigenspace as extremal within the eigenspace-aligned family. We also derive scalable estimators based on Hessian-vector products, subspace Monte Carlo, and a closed-form Gaussian-moment proxy. On a decoder-only transformer, a curvature-aligned probe occupying a tiny fraction of parameter space already reproduces the full-space mean-squared signal to within numerical noise throughout the validated local regime, and the closed-form estimator is orders of magnitude faster than direct Monte Carlo after subspace construction.
Problem

Research questions and friction points this paper is trying to address.

loss landscape
local stabilization
curvature alignment
anisotropic neural landscapes
empirical Hessian
Innovation

Methods, ideas, or system contributions that make the work stand out.

curvature-aligned probing
loss landscape stabilization
empirical Hessian
subspace estimation
scalable estimators