Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional least squares (LS) is commonly treated as a static fitting tool, making it incompatible with end-to-end differentiable learning frameworks. This work reformulates LS as a differentiable operator: analytical gradients are derived via the implicit function theorem, and matrix-free iterative solvers eliminate explicit storage and inversion of large matrices; a customized backward pass enables implicit incorporation of structural constraints—such as sparsity and conservatism—without modifying the forward computation. The resulting operator integrates seamlessly into automatic differentiation systems and supports joint optimization with neural networks. Experiments demonstrate computational efficiency at the 50-million-parameter scale and successful application to physics-constrained generative modeling and performance-driven hyperparameter optimization in Gaussian processes. By unifying classical LS estimation with modern differentiable programming, our approach significantly enhances the expressivity and practical utility of least squares within contemporary machine learning pipelines.

Technology Category

Application Category

📝 Abstract
This paper argues that the method of least squares has significant unfulfilled potential in modern machine learning, far beyond merely being a tool for fitting linear models. To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer, enabling many diverse applications. Empirically, we demonstrate: (i) scalability by enforcing weight sparsity on a 50 million parameter model; (ii) imposing conservativeness constraints in score-based generative models; and (iii) hyperparameter tuning of Gaussian processes based on predictive performance. By doing this, our work represents the next iteration in developing differentiable linear-algebra tools and making them widely accessible to machine learning practitioners.
Problem

Research questions and friction points this paper is trying to address.

Unlocking least squares potential in modern machine learning
Developing differentiable least squares solvers as neural layers
Enabling scalable sparse models and constrained optimization applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Custom gradients transform solver into differentiable operator
Scalable sparsity enforcement on large parameter models
Imposing constraints in generative models and hyperparameter tuning
🔎 Similar Papers
No similar papers found.