Towards Data-free and Training-free Compression for Speech Foundation Models Using Parameter Clustering

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of efficiently compressing speech foundation models without access to data or the need for retraining, while preserving their speech recognition performance. To this end, the authors propose a data- and training-agnostic compression method that combines channel-wise k-means clustering for parameter quantization with a layer-adaptive mixed sparsity pruning strategy, which dynamically sets the number of clusters per layer to yield fine-grained, structurally adaptive sparse models. This work presents the first application of data-free parameter clustering to speech foundation model compression. Experiments demonstrate that on HuBERT-large, the method achieves an absolute word error rate (WER) reduction of 27.73% and 18.61% at 50% sparsity without fine-tuning. On Whisper-large-v3, it yields significant WER improvements at 10% sparsity—nearly matching the original model’s performance—and outperforms conventional pruning approaches even after only three fine-tuning epochs.
📝 Abstract
This paper presents a novel data-free and training-free compression approach for speech foundation models using channelwise clustering via k-means. More fine-grained, mixed sparsity pruning by layer-level varying number of parameter clusters is also explored. Experiments conducted on the LibriSpeech dataset suggest that when operating with pruning sparsity of 50% on HuBERT-large, consistent WER reductions of 27.73%/18.61% absolute (34.37%/21.91% relative) over the magnitude-based pruning were obtained on the test-clean and test-other subsets before fine-tuning and 0.19%/0.79% absolute (3.36%/4.62% relative) after fine-tuning with only 3 epochs. Similar WER reductions of 2.86%/5.02% absolute (59.21%/55.29% relative) were observed against magnitudebased pruning on Whisper-large-v3 at 10% sparsity, all with no significant WER increase relative to the uncompressed baseline.
Problem

Research questions and friction points this paper is trying to address.

data-free compression
training-free compression
speech foundation models
parameter clustering
model pruning
Innovation

Methods, ideas, or system contributions that make the work stand out.

data-free compression
training-free pruning
parameter clustering
channelwise k-means
mixed sparsity