🤖 AI Summary
This work addresses the scalability bottleneck of linearized Laplace approximation (LLA) in large-scale pretrained models, which arises from the explicit computation of prohibitively large Jacobian matrices. To overcome this limitation, the authors propose an efficient alternative based on a learnable surrogate neural network that learns compact feature representations whose inner products approximate the neural tangent kernel (NTK). By relying solely on efficient Jacobian-vector products, the method avoids constructing the full Jacobian matrix explicitly. Innovatively, the approach introduces a biased yet learnable surrogate kernel, which not only substantially improves computational efficiency but also enhances out-of-distribution detection performance, thereby moving beyond the traditional LLA’s reliance on fixed kernels. Experimental results demonstrate that the proposed method achieves superior out-of-distribution detection and scalability on large models while maintaining or even improving uncertainty calibration.
📝 Abstract
We introduce a scalable method to approximate the kernel of the Linearized Laplace Approximation (LLA). For this, we use a surrogate deep neural network (DNN) that learns a compact feature representation whose inner product replicates the Neural Tangent Kernel (NTK). This avoids the need to compute large Jacobians. Training relies solely on efficient Jacobian-vector products, allowing to compute predictive uncertainty on large-scale pre-trained DNNs. Experimental results show similar or improved uncertainty estimation and calibration compared to existing LLA approximations. Notwithstanding, biasing the learned kernel significantly enhances out-of-distribution detection. This remarks the benefits of the proposed method for finding better kernels than the NTK in the context of LLA to compute prediction uncertainty given a pre-trained DNN.