SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism

📅 2026-01-20

🏛️ ACM Transactions on Management Information Systems

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Standard end-to-end backpropagation for training deep neural networks suffers from low computational efficiency, high training costs, and prolonged development cycles, hindering the deployment of large models in enterprise information systems. To address these limitations, this work proposes Supervised Contrastive Parallel Learning (SCPL), a novel approach that integrates supervised contrastive learning with decoupled local losses. By decomposing long gradient flows into multiple shorter ones, SCPL breaks the sequential dependency inherent in backpropagation, enabling parallel gradient computation across network layers. The method incorporates a multi-stage gradient synchronization mechanism that maintains model performance while substantially improving training throughput. Experimental results demonstrate that SCPL outperforms standard backpropagation, Early Exit, GPipe, and Associated Learning in training efficiency, offering a new paradigm for scalable and efficient large-model training.

Technology Category

Application Category

📝 Abstract

Adopting large-scale AI models in enterprise information systems is often hindered by high training costs and long development cycles, posing a significant managerial challenge. The standard end-to-end backpropagation (BP) algorithm is a primary driver of modern AI, but it is also the source of inefficiency in training deep networks. This paper introduces a new training methodology, Supervised Contrastive Parallel Learning (SCPL), that addresses this issue by decoupling BP and transforming a long gradient flow into multiple short ones. This design enables the simultaneous computation of parameter gradients in different layers, achieving superior model parallelism and enhancing training throughput. Detailed experiments are presented to demonstrate the efficiency and effectiveness of our model compared to BP, Early Exit, GPipe, and Associated Learning (AL), a state-of-the-art method for decoupling backpropagation. By mitigating a fundamental performance bottleneck, SCPL provides a practical pathway for organizations to develop and deploy advanced information systems more cost-effectively and with greater agility. The experimental code is released for reproducibility.

Problem

Research questions and friction points this paper is trying to address.

training throughput

backpropagation inefficiency

model parallelism

large-scale AI models

training cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised Contrastive Parallel Learning

Decoupled Backpropagation

Model Parallelism