🤖 AI Summary
Traditional least squares (LS) is commonly treated as a static fitting tool, making it incompatible with end-to-end differentiable learning frameworks. This work reformulates LS as a differentiable operator: analytical gradients are derived via the implicit function theorem, and matrix-free iterative solvers eliminate explicit storage and inversion of large matrices; a customized backward pass enables implicit incorporation of structural constraints—such as sparsity and conservatism—without modifying the forward computation. The resulting operator integrates seamlessly into automatic differentiation systems and supports joint optimization with neural networks. Experiments demonstrate computational efficiency at the 50-million-parameter scale and successful application to physics-constrained generative modeling and performance-driven hyperparameter optimization in Gaussian processes. By unifying classical LS estimation with modern differentiable programming, our approach significantly enhances the expressivity and practical utility of least squares within contemporary machine learning pipelines.
📝 Abstract
This paper argues that the method of least squares has significant unfulfilled potential in modern machine learning, far beyond merely being a tool for fitting linear models. To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer, enabling many diverse applications. Empirically, we demonstrate: (i) scalability by enforcing weight sparsity on a 50 million parameter model; (ii) imposing conservativeness constraints in score-based generative models; and (iii) hyperparameter tuning of Gaussian processes based on predictive performance. By doing this, our work represents the next iteration in developing differentiable linear-algebra tools and making them widely accessible to machine learning practitioners.