Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Conformer and speech foundation models (e.g., wav2vec 2.0) suffer from excessive memory and storage overhead, hindering efficient deployment. Method: This paper proposes a “small-to-large” unfoldable compression paradigm: starting from a lightweight seed model, it enables multi-depth dynamic unfolding via structured parameter sharing, supporting on-demand deployment. We design an unfoldable architecture with single-cycle joint training and introduce KL-based self-distillation between the seed and fully unfolded models to ensure consistent performance across all depths. Contribution/Results: Our method reduces parameter counts by 35% for Conformer and 30% for wav2vec 2.0 without any ASR performance degradation. It significantly lowers training GPU memory, inference memory footprint, and model storage requirements—achieving both high efficiency and practical deployability.

Technology Category

Application Category

📝 Abstract

This paper presents a novel memory-efficient model compression approach for Conformer ASR and speech foundation systems. Our approach features a unique"small-to-large"design. A compact"seed"model containing a few Conformer or Transformer blocks is trained and unfolded many times to emulate the performance of larger uncompressed models with different logical depths. The seed model and many unfolded paths are jointly trained within a single unfolding cycle. The KL-divergence between the largest unfolded and smallest seed models is used in a self-distillation process to minimize their performance disparity. Experimental results show that our foldable model produces ASR performance comparable to individually constructed Conformer and wav2vec2/HuBERT speech foundation models under various depth configurations, while requiring only minimal memory and storage. Conformer and wav2vec2 models with a reduction of 35% and 30% parameters are obtained without loss of performance, respectively.

Problem

Research questions and friction points this paper is trying to address.

Memory-efficient compression for Conformer ASR models

Reducing parameters without performance loss in speech models

Self-distillation to minimize performance disparity in unfolded models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Small-to-large model unfolding design

Joint training of seed and unfolded paths

Self-distillation using KL-divergence minimization

🔎 Similar Papers

No similar papers found.