US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical ultrasound images suffer from low signal-to-noise ratios and speckle noise, which limit the effectiveness of conventional pixel-level self-supervised learning methods. To address this challenge, this work proposes US-JEPA, a framework based on the Joint Embedding Predictive Architecture (JEPA), which employs a frozen, domain-specific static teacher model to provide stable latent targets. By leveraging masked latent prediction, US-JEPA enables decoupled optimization between student and teacher networks, circumventing the instability associated with online teacher updates that are sensitive to hyperparameter tuning. This approach is the first to systematically evaluate ultrasound foundation models across multiple organs and pathologies on the UltraBench benchmark, achieving performance in linear probe classification tasks that matches or exceeds existing domain-specific and general-purpose vision models.

Technology Category

Application Category

📝 Abstract
Ultrasound (US) imaging poses unique challenges for representation learning due to its inherently noisy acquisition process. The low signal-to-noise ratio and stochastic speckle patterns hinder standard self-supervised learning methods relying on a pixel-level reconstruction objective. Joint-Embedding Predictive Architectures (JEPAs) address this drawback by predicting masked latent representations rather than raw pixels. However, standard approaches depend on hyperparameter-brittle and computationally expensive online teachers updated via exponential moving average. We propose US-JEPA, a self-supervised framework that adopts the Static-teacher Asymmetric Latent Training (SALT) objective. By using a frozen, domain-specific teacher to provide stable latent targets, US-JEPA decouples student-teacher optimization and pushes the student to expand upon the semantic priors of the teacher. In addition, we provide the first rigorous comparison of all publicly available state-of-the-art ultrasound foundation models on UltraBench, a public dataset benchmark spanning multiple organs and pathological conditions. Under linear probing for diverse classification tasks, US-JEPA achieves performance competitive with or superior to domain-specific and universal vision foundation model baselines. Our results demonstrate that masked latent prediction provides a stable and efficient path toward robust ultrasound representations.
Problem

Research questions and friction points this paper is trying to address.

ultrasound imaging
representation learning
self-supervised learning
noise
speckle patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

US-JEPA
Joint-Embedding Predictive Architecture
Static-teacher Asymmetric Latent Training
self-supervised learning
ultrasound representation learning
🔎 Similar Papers
No similar papers found.