Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of training vision Transformers (ViTs) on resource-constrained edge devices, this paper proposes Weight-Activation Subspace Iteration (WASI), the first method to introduce subspace optimization into ViT training. WASI jointly models the low-dimensional, low-rank structure of both weights and activations, constraining gradient computation and storage to a dynamically updated joint subspace during backpropagation—thereby significantly alleviating memory bottlenecks. Experiments on Raspberry Pi 5 demonstrate that WASI reduces peak training memory by up to 62×, cuts computational cost by 2×, and accelerates end-to-end training by 1.5×, all while preserving the original model’s accuracy. This work establishes a scalable, high-fidelity paradigm for lightweight ViT training directly on edge devices.

Technology Category

Application Category

📝 Abstract
As AI increasingly shapes daily life, energy consumption and data privacy have become pressing concerns. On-device learning trains models directly on edge devices, cutting energy consumption and safeguarding data privacy. However, the expanding scale of modern neural networks creates a major obstacle for on-device training. Although prior work has concentrated on compact convolutional architectures, we instead apply subspace-based training to transformer models. Motivated by the idea that a model's essential information lies in a fixed subspace, we introduce Weight-Activation Subspace Iteration (WASI), a method that mitigates the memory bottleneck of backpropagation and boosts inference efficiency in transformer models by restricting training to this subspace. Our results demonstrate that WASI maintains accuracy comparable to vanilla training while reducing memory usage by up to $62 imes$ and computational cost (FLOPs) by up to $2 imes$. On a Raspberry Pi 5, WASI achieves roughly $1.5 imes$ faster training and inference than vanilla training.
Problem

Research questions and friction points this paper is trying to address.

Reducing memory usage in transformer training via subspace optimization
Accelerating on-device learning while maintaining model accuracy
Overcoming computational bottlenecks for edge device deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Subspace optimization reduces transformer memory usage
Weight-Activation Subspace Iteration enables efficient training
Method maintains accuracy while cutting computational costs
🔎 Similar Papers
No similar papers found.
L
Le-Trung Nguyen
LTCI, Télécom Paris, Institut Polytechnique de Paris
Enzo Tartaglione
Enzo Tartaglione
Associate Professor, Télécom Paris, Institut Polytechnique de Paris
deep learningcompressionpruningdebiasingfrugal AI
V
Van-Tam Nguyen
LTCI, Télécom Paris, Institut Polytechnique de Paris