SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism

📅 2026-01-20
🏛️ ACM Transactions on Management Information Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard end-to-end backpropagation for training deep neural networks suffers from low computational efficiency, high training costs, and prolonged development cycles, hindering the deployment of large models in enterprise information systems. To address these limitations, this work proposes Supervised Contrastive Parallel Learning (SCPL), a novel approach that integrates supervised contrastive learning with decoupled local losses. By decomposing long gradient flows into multiple shorter ones, SCPL breaks the sequential dependency inherent in backpropagation, enabling parallel gradient computation across network layers. The method incorporates a multi-stage gradient synchronization mechanism that maintains model performance while substantially improving training throughput. Experimental results demonstrate that SCPL outperforms standard backpropagation, Early Exit, GPipe, and Associated Learning in training efficiency, offering a new paradigm for scalable and efficient large-model training.

Technology Category

Application Category

📝 Abstract
Adopting large-scale AI models in enterprise information systems is often hindered by high training costs and long development cycles, posing a significant managerial challenge. The standard end-to-end backpropagation (BP) algorithm is a primary driver of modern AI, but it is also the source of inefficiency in training deep networks. This paper introduces a new training methodology, Supervised Contrastive Parallel Learning (SCPL), that addresses this issue by decoupling BP and transforming a long gradient flow into multiple short ones. This design enables the simultaneous computation of parameter gradients in different layers, achieving superior model parallelism and enhancing training throughput. Detailed experiments are presented to demonstrate the efficiency and effectiveness of our model compared to BP, Early Exit, GPipe, and Associated Learning (AL), a state-of-the-art method for decoupling backpropagation. By mitigating a fundamental performance bottleneck, SCPL provides a practical pathway for organizations to develop and deploy advanced information systems more cost-effectively and with greater agility. The experimental code is released for reproducibility.
Problem

Research questions and friction points this paper is trying to address.

training throughput
backpropagation inefficiency
model parallelism
large-scale AI models
training cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised Contrastive Parallel Learning
Decoupled Backpropagation
Model Parallelism
Training Throughput
Local Losses
🔎 Similar Papers
No similar papers found.
M
Ming-Yao Ho
Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
C
Cheng-Kai Wang
Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
Y
You-Teng Lin
Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
Hung-Hsuan Chen
Hung-Hsuan Chen
Associate Professor of National Central University
machine learningdeep learninginformation retrieval