Compose Yourself: Average-Velocity Flow Matching for One-Step Speech Enhancement

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion and flow-matching models for speech enhancement suffer from multi-step sampling, high computational overhead, and sensitivity to discretization error. To address these issues, this paper proposes COSE, a single-step generative framework. Its core innovation lies in reconstructing the dynamical process via an average velocity field, efficiently computed using a velocity composition identity—thereby avoiding costly Jacobian-vector products. Theoretically consistent with continuous-time flow matching and preserving speech enhancement quality, COSE significantly reduces both training and inference complexity. On standard benchmarks, it achieves up to 5× sampling speedup and reduces training cost by 40%, while maintaining high fidelity and perceptual quality.

Technology Category

Application Category

📝 Abstract
Diffusion and flow matching (FM) models have achieved remarkable progress in speech enhancement (SE), yet their dependence on multi-step generation is computationally expensive and vulnerable to discretization errors. Recent advances in one-step generative modeling, particularly MeanFlow, provide a promising alternative by reformulating dynamics through average velocity fields. In this work, we present COSE, a one-step FM framework tailored for SE. To address the high training overhead of Jacobian-vector product (JVP) computations in MeanFlow, we introduce a velocity composition identity to compute average velocity efficiently, eliminating expensive computation while preserving theoretical consistency and achieving competitive enhancement quality. Extensive experiments on standard benchmarks show that COSE delivers up to 5x faster sampling and reduces training cost by 40%, all without compromising speech quality. Code is available at https://github.com/ICDM-UESTC/COSE.
Problem

Research questions and friction points this paper is trying to address.

One-step speech enhancement with reduced computational cost
Efficient average velocity computation avoiding Jacobian-vector products
Maintaining speech quality while accelerating sampling and training
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step flow matching framework
Velocity composition identity computation
Efficient average velocity calculation
🔎 Similar Papers
G
Gang Yang
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Y
Yue Lei
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Wenxin Tai
Wenxin Tai
University of Electronic Science and Technology of China
Trustworthy AI
J
Jin Wu
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
J
Jia Chen
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Ting Zhong
Ting Zhong
University of Electronic Science and Technology of China
Deep LearningSocial Networks
F
Fan Zhou
University of Electronic Science and Technology of China, Chengdu, Sichuan, China