🤖 AI Summary
This paper investigates the learnability of cross-task shared structures—such as low-rank subspaces or clustering structures—in few-shot (especially one-shot) multi-task learning and meta-learning, under general convex optimization objectives (beyond linear regression). We jointly model low-rank and clustering structures over task-specific optimal solutions and introduce concentration assumptions on the Hessian and noise at these optima. We establish, for the first time, that subspace recovery is feasible from one sample per task—but requires exponentially many tasks; we further propose the first polynomial-time kernel-norm regularized algorithm for this setting. Our theoretical analysis precisely characterizes the sample- and task-complexity trade-offs necessary for structure recovery. The proposed method is scalable, enjoys statistical guarantees, and enables efficient learning of shared linear representations under general convex objectives.
📝 Abstract
Motivated by multi-task and meta-learning approaches, we consider the problem of learning structure shared by tasks or users, such as shared low-rank representations or clustered structures. While all previous works focus on well-specified linear regression, we consider more general convex objectives, where the structural low-rank and cluster assumptions are expressed on the optima of each function. We show that under mild assumptions such as extit{Hessian concentration} and extit{noise concentration at the optimum}, rank and clustered regularized estimators recover such structure, provided the number of samples per task and the number of tasks are large enough. We then study the problem of recovering the subspace in which all the solutions lie, in the setting where there is only a single sample per task: we show that in that case, the rank-constrained estimator can recover the subspace, but that the number of tasks needs to scale exponentially large with the dimension of the subspace. Finally, we provide a polynomial-time algorithm via nuclear norm constraints for learning a shared linear representation in the context of convex learning objectives.