๐ค AI Summary
This work studies knowledge transfer in overparameterized linear models under task mismatchโwhere training is classification but testing is regression. Theoretically, we prove that the classification minimum-norm interpolator fails to generalize to regression in the zero-shot setting: its parameters exhibit structured decay, inducing an irreducible prediction bias. To overcome this limitation, we propose a post-processing algorithm leveraging a small number of regression samples; through fine-grained parameter calibration, it achieves asymptotically unbiased and statistically optimal regression predictions. This work establishes the first provable linear benchmark framework for cross-task transfer, rigorously characterizing the inherent impossibility of zero-shot transfer between classification and regression. Moreover, it derives tight performance bounds for the few-shot regime, providing fundamental theoretical insights into the cross-task generalization capabilities of modern overparameterized models.
๐ Abstract
Modern machine learning methods have recently demonstrated remarkable capability to generalize under task shift, where latent knowledge is transferred to a different, often more difficult, task under a similar data distribution. We investigate this phenomenon in an overparameterized linear regression setting where the task shifts from classification during training to regression during evaluation. In the zero-shot case, wherein no regression data is available, we prove that task shift is impossible in both sparse signal and random signal models for any Gaussian covariate distribution. In the few-shot case, wherein limited regression data is available, we propose a simple postprocessing algorithm which asymptotically recovers the ground-truth predictor. Our analysis leverages a fine-grained characterization of individual parameters arising from minimum-norm interpolation which may be of independent interest. Our results show that while minimum-norm interpolators for classification cannot transfer to regression a priori, they experience surprisingly structured attenuation which enables successful task shift with limited additional data.