DDSP-QbE++: Improving Speech Quality for Speech Anonymisation for Atypical Speech

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the high-frequency aliasing artifacts—such as buzzing and spectral distortion—caused by discontinuities in sawtooth-wave excitation within DDSP-based query-by-example voice conversion, particularly under high fundamental frequencies that degrade speech naturalness. The authors propose an improved approach that integrates explicit voicing detection during excitation: aperiodic components in unvoiced regions are synthesized using filtered noise instead of periodic excitation, while PolyBLEP is employed to correct waveform discontinuities at phase wrap-around points, effectively mitigating aliasing. This method uniquely combines voicing-gated excitation with PolyBLEP-based anti-aliasing within a DDSP framework for voice anonymization, achieving significantly enhanced perceptual naturalness and harmonic roll-off without introducing additional learnable parameters or training overhead. Consequently, it yields lightweight, differentiable, and high-quality voice anonymization, as evidenced by notably improved MOS scores.

Technology Category

Application Category

📝 Abstract

Differentiable Digital Signal Processing (DDSP) pipelines for voice conversion rely on subtractive synthesis, where a periodic excitation signal is shaped by a learned spectral envelope to reconstruct the target voice. In DDSP-QbE, the excitation is generated via phase accumulation, producing a sawtooth-like waveform whose abrupt discontinuities introduce aliasing artefacts that manifest perceptually as buzziness and spectral distortion, particularly at higher fundamental frequencies. We propose two targeted improvements to the excitation stage of the DDSP-QbE subtractive synthesizer. First, we incorporate explicit voicing detection to gate the harmonic excitation, suppressing the periodic component in unvoiced regions and replacing it with filtered noise, thereby avoiding aliased harmonic content where it is most perceptually disruptive. Second, we apply Polynomial Band-Limited Step (PolyBLEP) correction to the phase-accumulated oscillator, substituting the hard waveform discontinuity at each phase wrap with a smooth polynomial residual that cancels alias-generating components without oversampling or spectral truncation. Together, these modifications yield a cleaner harmonic roll-off, reduced high-frequency artefacts, and improved perceptual naturalness, as measured by MOS. The proposed approach is lightweight, differentiable, and integrates seamlessly into the existing DDSP-QbE training pipeline with no additional learnable parameters.

Problem

Research questions and friction points this paper is trying to address.

speech anonymisation

atypical speech

aliasing artefacts

voice conversion

excitation signal

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable Digital Signal Processing

PolyBLEP

voicing detection