DeInfoReg: A Decoupled Learning Framework for Better Training Throughput

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe gradient vanishing and low parallelization efficiency in deep networks with long gradient propagation paths, this paper proposes DeInfoReg—a decoupled supervised learning framework. Its core innovation lies in explicit information regularization of intermediate-layer representations, which decomposes the long gradient flow into multiple short flows, fundamentally mitigating gradient vanishing. Additionally, a lightweight pipelined scheduling mechanism is designed to enable efficient cross-GPU model parallelism. DeInfoReg requires no architectural or loss-function modifications, ensuring broad compatibility. Extensive experiments across image classification and semantic segmentation demonstrate that DeInfoReg significantly improves training throughput (up to 2.1×), enhances convergence stability, and exhibits superior robustness to label noise—outperforming standard backpropagation and existing gradient decomposition methods under identical hardware constraints.

Technology Category

Application Category

📝 Abstract
This paper introduces Decoupled Supervised Learning with Information Regularization (DeInfoReg), a novel approach that transforms a long gradient flow into multiple shorter ones, thereby mitigating the vanishing gradient problem. Integrating a pipeline strategy, DeInfoReg enables model parallelization across multiple GPUs, significantly improving training throughput. We compare our proposed method with standard backpropagation and other gradient flow decomposition techniques. Extensive experiments on diverse tasks and datasets demonstrate that DeInfoReg achieves superior performance and better noise resistance than traditional BP models and efficiently utilizes parallel computing resources. The code for reproducibility is available at: https://github.com/ianzih/Decoupled-Supervised-Learning-for-Information-Regularization/.
Problem

Research questions and friction points this paper is trying to address.

Mitigates vanishing gradient problem in deep learning
Enables efficient model parallelization across GPUs
Improves training throughput and noise resistance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples gradient flow into shorter segments
Enables model parallelization across GPUs
Improves training throughput and noise resistance
🔎 Similar Papers
No similar papers found.
Z
Zih-Hao Huang
Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
Y
You-Teng Lin
Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
Hung-Hsuan Chen
Hung-Hsuan Chen
Associate Professor of National Central University
machine learningdeep learninginformation retrieval