Alternating Approach-Putt Models for Multi-Stage Speech Enhancement

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech enhancement models often introduce distortion artifacts during denoising, degrading perceptual quality and intelligibility. To address this, we propose an alternating multi-stage enhancement framework that jointly optimizes a primary enhancement network and a novel lightweight post-processing module, PuttNet. Inspired by the precision and incremental refinement of golf putting, PuttNet performs fine-grained correction of residual distortions with minimal computational overhead. Crucially, PuttNet and the primary network operate in alternation—rather than cascading—thereby preventing error accumulation from repeated application of a single model. Our method achieves comprehensive improvements across perceptual quality (PESQ), intelligibility (STOI), and background noise suppression (CBAK). Experiments demonstrate significant gains in objective metrics and robust performance across diverse noise conditions. To our knowledge, this is the first work to introduce a putting-inspired, progressive refinement mechanism into speech enhancement post-processing, establishing a new paradigm for distortion suppression.

Technology Category

Application Category

📝 Abstract
Speech enhancement using artificial neural networks aims to remove noise from noisy speech signals while preserving the speech content. However, speech enhancement networks often introduce distortions to the speech signal, referred to as artifacts, which can degrade audio quality. In this work, we propose a post-processing neural network designed to mitigate artifacts introduced by speech enhancement models. Inspired by the analogy of making a `Putt' after an `Approach' in golf, we name our model PuttNet. We demonstrate that alternating between a speech enhancement model and the proposed Putt model leads to improved speech quality, as measured by perceptual quality scores (PESQ), objective intelligibility (STOI), and background noise intrusiveness (CBAK) scores. Furthermore, we illustrate with graphical analysis why this alternating Approach outperforms repeated application of either model alone.
Problem

Research questions and friction points this paper is trying to address.

Mitigate artifacts in speech enhancement models
Improve speech quality via alternating neural networks
Enhance perceptual and intelligibility scores effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-processing neural network reduces artifacts
Alternates enhancement and PuttNet for better quality
Improves PESQ, STOI, and CBAK scores
🔎 Similar Papers
No similar papers found.
I
Iksoon Jeong
Department of Physics, Chungnam National University, Daejeon 34134, Republic of Korea
Kyung-Joong Kim
Kyung-Joong Kim
Professor, Department of AI Convergence, GIST
Artificial IntelligenceGamesGame AI
K
Kang-Hun Ahn
Department of Physics, Chungnam National University, Daejeon 34134, Republic of Korea, and Hearing Loss Research Lab., Deep Hearing Corp., Republic of Korea