MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
End-to-end backpropagation in deep learning suffers from update locking, high GPU memory consumption, and biological implausibility; while supervised local learning alleviates these issues, it incurs substantial performance degradation due to gradient isolation across modules. To address this trade-off, we propose MAN++, a Momentum-Assisted Network framework that enhances inter-module information flow via two key mechanisms: (i) a dynamic interaction module based on exponential moving average (EMA) to stabilize and propagate module-level updates, and (ii) a learnable scale-and-bias module for adaptive alignment of local feature distributions. MAN++ supports diverse vision tasks—including image classification, object detection, and semantic segmentation—across multiple architectures and datasets. It achieves accuracy comparable to end-to-end training while reducing GPU memory usage by 30–50%. This work establishes a new paradigm for efficient, scalable, and biologically plausible local training.

Technology Category

Application Category

📝 Abstract
Deep learning typically relies on end-to-end backpropagation for training, a method that inherently suffers from issues such as update locking during parameter optimization, high GPU memory consumption, and a lack of biological plausibility. In contrast, supervised local learning seeks to mitigate these challenges by partitioning the network into multiple local blocks and designing independent auxiliary networks to update each block separately. However, because gradients are propagated solely within individual local blocks, performance degradation occurs, preventing supervised local learning from supplanting end-to-end backpropagation. To address these limitations and facilitate inter-block information flow, we propose the Momentum Auxiliary Network++ (MAN++). MAN++ introduces a dynamic interaction mechanism by employing the Exponential Moving Average (EMA) of parameters from adjacent blocks to enhance communication across the network. The auxiliary network, updated via EMA, effectively bridges the information gap between blocks. Notably, we observed that directly applying EMA parameters can be suboptimal due to feature discrepancies between local blocks. To resolve this issue, we introduce a learnable scaling bias that balances feature differences, thereby further improving performance. We validate MAN++ through extensive experiments on tasks that include image classification, object detection, and image segmentation, utilizing multiple network architectures. The experimental results demonstrate that MAN++ achieves performance comparable to end-to-end training while significantly reducing GPU memory usage. Consequently, MAN++ offers a novel perspective for supervised local learning and presents a viable alternative to conventional training methods.
Problem

Research questions and friction points this paper is trying to address.

Mitigates update locking and high GPU memory in deep learning
Enhances inter-block communication in supervised local learning
Reduces performance degradation in partitioned network training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses EMA for inter-block parameter sharing
Introduces learnable scaling bias for feature balance
Reduces GPU memory while matching end-to-end performance
Junhao Su
Junhao Su
MeiTuan Inc.
Computer Vision
F
Feiyu Zhu
AttrSense, Shanghai, China
H
Hengyu Shi
Visual Intelligence in MeiTuan, Beijing and Shanghai, China
Tianyang Han
Tianyang Han
The Hong Kong Polytechnic University (PolyU)
Image generationMultimodal Large Language Model
Y
Yurui Qiu
Visual Intelligence in MeiTuan, Beijing and Shanghai, China
J
Junfeng Luo
Visual Intelligence in MeiTuan, Beijing and Shanghai, China
Xiaoming Wei
Xiaoming Wei
Meituan
computer visionmachine learning
Jialin Gao
Jialin Gao
National University of Singapore
Video Understanding Multi-modal Understanding