🤖 AI Summary
This work addresses the slow convergence, high memory overhead, and insufficient theoretical characterization of Consensus-Based Optimization (CBO) for two-layer neural networks in multi-task learning. Methodologically: (1) it formulates a joint dynamical model on the Wasserstein-over-Wasserstein space, coupling particle-system evolution with network-parameter dynamics; (2) it designs a CBO-Adam hybrid optimizer to accelerate convergence; and (3) it reformulates the CBO update rule to substantially reduce memory complexity in multi-task settings. Theoretically, it establishes— for the first time under the infinite-particle limit—a rigorously derived Wasserstein-over-Wasserstein dynamical system whose variance monotonically decreases, yielding a provably optimal mean-field interpretation of CBO. Empirically, the hybrid strategy achieves significant convergence speedup while maintaining high efficiency and scalability across diverse multi-task benchmarks.
📝 Abstract
We study two-layer neural networks and train these with a particle-based method called consensus-based optimization (CBO). We compare the performance of CBO against Adam on two test cases and demonstrate how a hybrid approach, combining CBO with Adam, provides faster convergence than CBO. In the context of multi-task learning, we recast CBO into a formulation that offers less memory overhead. The CBO method allows for a mean-field limit formulation, which we couple with the mean-field limit of the neural network. To this end, we first reformulate CBO within the optimal transport framework. Finally, in the limit of infinitely many particles, we define the corresponding dynamics on the Wasserstein-over-Wasserstein space and show that the variance decreases monotonically.