๐ค AI Summary
This work addresses the high communication overhead and memory consumption inherent in conventional split learning, which relies on end-to-end backpropagation requiring transmission of both forward activations and backward gradients in every round. The authors propose a decoupled split learning approach that introduces an auxiliary classifier at the client-side split point to provide local loss signals, while the server trains on received activations using ground-truth labels. This design enables semi-independent model updates on both sides without exchanging backward gradientsโthe first such mechanism in split learning to eliminate gradient communication entirely. Experimental results demonstrate that the method achieves accuracy comparable to standard split learning on CIFAR-10 and CIFAR-100, while reducing communication volume by approximately 50% and lowering peak memory usage by up to 58%.
๐ Abstract
Split learning is a distributed training paradigm where a neural network is partitioned between clients and a server, which allows data to remain at the client while only intermediate activations are shared. Traditional split learning relies on end-to-end backpropagation across the client-server split point. This incurs a large communication overhead (i.e., forward activations and backward gradients need to be exchanged every iteration) and significant memory use (for storing activations and gradients). In this paper, we develop a beyond-backpropagation training method for split learning. In this approach, the client and server train their model partitions semi-independently, using local loss signals instead of propagated gradients. In particular, the client's network is augmented with a small auxiliary classifier at the split point to provide a local error signal, while the server trains on the client's transmitted activations using the true loss function. This decoupling removes the need to send backward gradients, which cuts communication costs roughly in half and also reduces memory overhead (as each side only stores local activations for its own backward pass). We evaluate our approach on CIFAR-10 and CIFAR-100. Our experiments show two key results. First, the proposed approach achieves performance on par with standard split learning that uses backpropagation. Second, it significantly reduces communication (of transmitting activations/gradient) by 50% and peak memory usage by up to 58%.