Decoupled Split Learning via Auxiliary Loss

๐Ÿ“… 2026-01-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the high communication overhead and memory consumption inherent in conventional split learning, which relies on end-to-end backpropagation requiring transmission of both forward activations and backward gradients in every round. The authors propose a decoupled split learning approach that introduces an auxiliary classifier at the client-side split point to provide local loss signals, while the server trains on received activations using ground-truth labels. This design enables semi-independent model updates on both sides without exchanging backward gradientsโ€”the first such mechanism in split learning to eliminate gradient communication entirely. Experimental results demonstrate that the method achieves accuracy comparable to standard split learning on CIFAR-10 and CIFAR-100, while reducing communication volume by approximately 50% and lowering peak memory usage by up to 58%.

Technology Category

Application Category

๐Ÿ“ Abstract
Split learning is a distributed training paradigm where a neural network is partitioned between clients and a server, which allows data to remain at the client while only intermediate activations are shared. Traditional split learning relies on end-to-end backpropagation across the client-server split point. This incurs a large communication overhead (i.e., forward activations and backward gradients need to be exchanged every iteration) and significant memory use (for storing activations and gradients). In this paper, we develop a beyond-backpropagation training method for split learning. In this approach, the client and server train their model partitions semi-independently, using local loss signals instead of propagated gradients. In particular, the client's network is augmented with a small auxiliary classifier at the split point to provide a local error signal, while the server trains on the client's transmitted activations using the true loss function. This decoupling removes the need to send backward gradients, which cuts communication costs roughly in half and also reduces memory overhead (as each side only stores local activations for its own backward pass). We evaluate our approach on CIFAR-10 and CIFAR-100. Our experiments show two key results. First, the proposed approach achieves performance on par with standard split learning that uses backpropagation. Second, it significantly reduces communication (of transmitting activations/gradient) by 50% and peak memory usage by up to 58%.
Problem

Research questions and friction points this paper is trying to address.

split learning
communication overhead
memory usage
backpropagation
distributed training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Split Learning
Decoupled Training
Auxiliary Loss
Communication Efficiency
Memory Reduction
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Anower Zihad
School of Computing, Montclair State University, New Jersey, USA
F
Felix Owino
School of Computing, Montclair State University, New Jersey, USA
Haibo Yang
Haibo Yang
Rochester Institute of Technology
Federated LearningOptimizationMachine Learning
Ming Tang
Ming Tang
Southern University of Science and Technology
Chao Huang
Chao Huang
Montclair State University
split learningmachine learningnetwork economics