🤖 AI Summary
Diffusion models for inverse problems face two key challenges: the distortion–perception trade-off and exposure bias. To address these, we propose Regularized Schrödinger Bridge (RSB), a novel framework that jointly perturbs both observations and target variables, introduces a posterior-mean interpolation path, and explicitly models cumulative prediction error during training. Theoretically, RSB unifies forward diffusion and reverse inference under coupled constraints derived from optimal transport. Practically, it achieves superior balance between reconstruction fidelity and perceptual quality. Evaluated on speech enhancement, RSB significantly reduces objective distortion metrics (e.g., PESQ and WER) while improving subjective listening scores—outperforming state-of-the-art diffusion-based baselines and conventional supervised methods. These results demonstrate RSB’s robustness and generalization capability across diverse degradation scenarios.
📝 Abstract
Diffusion models serve as a powerful generative framework for solving inverse problems. However, they still face two key challenges: 1) the distortion-perception tradeoff, where improving perceptual quality often degrades reconstruction fidelity, and 2) the exposure bias problem, where the training-inference input mismatch leads to prediction error accumulation and reduced reconstruction quality. In this work, we propose the Regularized Schrödinger Bridge (RSB), an adaptation of Schrödinger Bridge tailored for inverse problems that addresses the above limitations. RSB employs a novel regularized training strategy that perturbs both the input states and targets, effectively mitigating exposure bias by exposing the model to simulated prediction errors and also alleviating distortion by well-designed interpolation via the posterior mean. Extensive experiments on two typical inverse problems for speech enhancement demonstrate that RSB outperforms state-of-the-art methods, significantly improving distortion metrics and effectively reducing exposure bias.