AdaCubic: An Adaptive Cubic Regularization Optimizer for Deep Learning

📅 2026-04-10
🏛️ Trans. Mach. Learn. Res.
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of conventional adaptive optimizers, which often rely heavily on hyperparameter tuning and struggle to simultaneously achieve strong convergence and generalization. The authors propose a scalable adaptive cubic-regularized optimizer that dynamically adjusts the regularization strength in Newton-type updates by solving an auxiliary optimization problem with a cubic constraint. To mitigate computational costs, the method leverages the Hutchinson estimator for efficient approximation of the Hessian matrix. As the first successful application of cubic regularization to large-scale deep learning, the proposed optimizer guarantees local convergence without requiring meticulous hyperparameter tuning. Empirical results across computer vision, natural language processing, and signal processing benchmarks demonstrate that, with a fixed set of hyperparameters, it consistently matches or outperforms state-of-the-art optimizers.

Technology Category

Application Category

📝 Abstract
A novel regularization technique, AdaCubic, is proposed that adapts the weight of the cubic term. The heart of AdaCubic is an auxiliary optimization problem with cubic constraints that dynamically adjusts the weight of the cubic term in Newton's cubic regularized method. We use Hutchinson's method to approximate the Hessian matrix, thereby reducing computational cost. We demonstrate that AdaCubic inherits the cubically regularized Newton method's local convergence guarantees. Our experiments in Computer Vision, Natural Language Processing, and Signal Processing tasks demonstrate that AdaCubic outperforms or competes with several widely used optimizers. Unlike other adaptive algorithms that require hyperparameter fine-tuning, AdaCubic is evaluated with a fixed set of hyperparameters, rendering it a highly attractive optimizer in settings where fine-tuning is infeasible. This makes AdaCubic an attractive option for researchers and practitioners alike. To our knowledge, AdaCubic is the first optimizer to leverage cubic regularization in scalable deep learning applications.
Problem

Research questions and friction points this paper is trying to address.

cubic regularization
adaptive optimizer
deep learning
Hessian approximation
hyperparameter tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

cubic regularization
adaptive optimization
Hessian approximation
Newton-type method
hyperparameter-free
🔎 Similar Papers
No similar papers found.
I
Ioannis Tsingalis
Aristotle University of Thessaloniki, Greece
C
Constantine Kotropoulos
Aristotle University of Thessaloniki, Greece
Corentin Briat
Corentin Briat
FHNW
Systems and Control TheoryComplex Systems and NetworksSystems and Synthetic BiologyMolecular Control SystemsMathematics