A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of synthesizing stochastic, perceptually stationary texture sounds. We propose TexStat—a differentiable statistical metric—and TexEnv, a lightweight differentiable synthesizer, integrated into TexDSP, an end-to-end generative model inspired by DDSP. Our key contributions are: (i) the first perceptually consistent, time-invariant, and noise-robust TexStat loss and evaluation metric; and (ii) the first deep integration of statistical-driven modeling with differentiable digital signal processing. TexDSP jointly optimizes statistical similarity, differentiable synthesis (via TexEnv), and DDSP-style neural architecture, evaluated using Frechet Audio Distance. Experiments demonstrate that TexStat exhibits strong perceptual correlation and robustness across diverse texture sound classes, while TexDSP achieves significant improvements in synthesis quality over prior approaches. All code and models are publicly released to ensure efficient training and fully reproducible evaluation.

Technology Category

Application Category

📝 Abstract
In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds characterized by stochastic structure and perceptual stationarity. Drawing inspiration from the statistical and perceptual framework of McDermott and Simoncelli, TexStat identifies similarities between signals belonging to the same texture category without relying on temporal structure. We also propose using TexStat as a validation metric alongside Frechet Audio Distances (FAD) to evaluate texture sound synthesis models. In addition to TexStat, we present TexEnv, an efficient, lightweight and differentiable texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored for texture sounds. Through extensive experiments across various texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that make it effective both as a loss function for generative tasks and as a validation metric. All tools and code are provided as open-source contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.
Problem

Research questions and friction points this paper is trying to address.

Develops TexStat for analyzing and synthesizing stochastic texture sounds
Proposes TexEnv as a lightweight differentiable texture sound synthesizer
Integrates TexStat and TexEnv into TexDSP for generative texture modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

TexStat: a novel loss function for texture sounds
TexEnv: lightweight differentiable texture sound synthesizer
TexDSP: DDSP-inspired generative model for textures
🔎 Similar Papers
No similar papers found.