Translation-Equivariant Self-Supervised Learning for Pitch Estimation with Optimal Transport

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses numerical instability and insufficient theoretical grounding in self-supervised learning for monophonic pitch estimation. We propose the first translation-equivariant self-supervised learning framework grounded in optimal transport (OT). Our method formalizes pitch translation invariance as Wasserstein distance minimization between one-dimensional probability distributions, yielding a theoretically rigorous, differentiable, and numerically stable loss function. Coupled with a translation-equivariant neural architecture, the framework enables end-to-end optimization. Crucially, this is the first systematic integration of OT theory into one-dimensional translation-equivariant signal modeling—replacing heuristic contrastive or reconstruction objectives prevalent in prior work. Evaluated on standard monophonic pitch estimation benchmarks, our approach achieves state-of-the-art performance, with marked improvements in training stability and generalization accuracy. These results empirically validate the framework’s theoretical soundness, computational robustness, and practical efficacy.

Technology Category

Application Category

📝 Abstract
In this paper, we propose an Optimal Transport objective for learning one-dimensional translation-equivariant systems and demonstrate its applicability to single pitch estimation. Our method provides a theoretically grounded, more numerically stable, and simpler alternative for training state-of-the-art self-supervised pitch estimators.
Problem

Research questions and friction points this paper is trying to address.

Develop translation-equivariant pitch estimation using Optimal Transport
Improve numerical stability in self-supervised pitch estimation
Simplify training for state-of-the-art pitch estimators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Transport for translation-equivariant learning
Self-supervised pitch estimation method
Numerically stable and simpler training
🔎 Similar Papers
No similar papers found.