Deep Multi-Task Learning Has Low Amortized Intrinsic Dimensionality

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates efficient modeling and generalization mechanisms for deep multi-task learning under parameter constraints. Addressing the empirical observation that multi-task models reside in low-dimensional subspaces, we propose a low-dimensional parameterization method based on random expansion, substantially reducing the number of free parameters. We derive the first non-vacuous PAC-Bayesian generalization bound for deep multi-task networks, theoretically establishing that their amortized intrinsic dimension is lower than that of single-task counterparts. Through joint analysis of intrinsic dimension estimation, weight compression, and random expansion, we demonstrate that the method achieves significant parameter reduction—average intrinsic dimension markedly below total parameter count—while preserving high multi-task accuracy. Moreover, it provides the first theoretically grounded characterization of generalization performance for such compressed multi-task architectures.

Technology Category

Application Category

📝 Abstract
Deep learning methods are known to generalize well from training to future data, even in an overparametrized regime, where they could easily overfit. One explanation for this phenomenon is that even when their *ambient dimensionality*, (i.e. the number of parameters) is large, the models' *intrinsic dimensionality* is small, i.e. their learning takes place in a small subspace of all possible weight configurations. In this work, we confirm this phenomenon in the setting of *deep multi-task learning*. We introduce a method to parametrize multi-task network directly in the low-dimensional space, facilitated by the use of *random expansions* techniques. We then show that high-accuracy multi-task solutions can be found with much smaller intrinsic dimensionality (fewer free parameters) than what single-task learning requires. Subsequently, we show that the low-dimensional representations in combination with *weight compression* and *PAC-Bayesian* reasoning lead to the first *non-vacuous generalization bounds* for deep multi-task networks.
Problem

Research questions and friction points this paper is trying to address.

Deep Multi-task Learning
Parameter Efficiency
Generalization Ability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task Learning
Parameter Efficiency
PAC-Bayesian Inference
🔎 Similar Papers
No similar papers found.