🤖 AI Summary
This work investigates the intrinsic conditions under which deep pretrained models can achieve modular compositionality in nonlinear networks, enabling multi-task modeling, task specialization, and selective forgetting.
Method: We propose a theoretical framework based on the second-order Taylor expansion of the loss function—introducing, for the first time, a second-order optimization perspective into compositional analysis—and identify “preserving the pretrained attractor basin” as the key mechanism ensuring modular composability. Building on this insight, we design a dual-path incremental training algorithm that supports module reuse, dynamic task integration, and controllable knowledge卸载 (unloading).
Results: Empirical evaluation on incremental classification tasks demonstrates that the constructed module pool enables efficient synthesis of high-performance multi-task models, improves single-task accuracy, and facilitates targeted knowledge erasure—thereby significantly enhancing model editability and adaptability.
📝 Abstract
The fine-tuning of deep pre-trained models has revealed compositional properties, with multiple specialized modules that can be arbitrarily composed into a single, multi-task model. However, identifying the conditions that promote compositionality remains an open issue, with recent efforts concentrating mainly on linearized networks. We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. The proposed formulation highlights the importance of staying within the pre-training basin to achieve composable modules. Moreover, it provides the basis for two dual incremental training algorithms: the one from the perspective of multiple models trained individually, while the other aims to optimize the composed model as a whole. We probe their application in incremental classification tasks and highlight some valuable skills. In fact, the pool of incrementally learned modules not only supports the creation of an effective multi-task model but also enables unlearning and specialization in certain tasks. Code available at https://github.com/aimagelab/mammoth.