Cohort-Scale Neural Atlases of Ultrasound Video

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the high annotation cost of ultrasound videos, degraded image quality due to speckle noise and probe pose variability, and the challenge of modeling dynamic clinical information. To overcome these limitations, the study presents the first extension of neural atlases to cohort-level ultrasound video data, introducing a generative latent optimization framework grounded in the DINOv3 feature space. By embedding each video into a unified coordinate system, the method enables efficient annotation transfer across sequences. It operates effectively under both one-shot and few-shot settings, offering an interpretable latent structure with smooth interpolation capabilities. Evaluated on five cardiac and musculoskeletal ultrasound datasets, the approach achieves high-fidelity atlas-based annotation transfer—comparable to strong baselines—after training for only a few minutes on a single consumer-grade GPU.

📝 Abstract

Ultrasound is the most widely used real-time imaging modality in clinical practice, yet per-frame video annotation remains a major bottleneck: expert labels are scarce and costly, and image appearance varies with speckle, shadowing, attenuation, and operator-dependent probe pose. This is especially limiting because clinically relevant information is often dynamic, from left-ventricular motion in echocardiography to muscle and bone kinematics in musculoskeletal imaging. Population atlases can amortize annotation cost by registering observations to a shared canonical coordinate system, but existing neural atlas methods mainly target single videos, small test-time image sets, or object-centric image collections. We introduce a cohort-scale neural atlas for ultrasound video: a single canonical chart with per-video Generative Latent Optimization embeddings, trained jointly over thousands of frames in DINOv3 feature space. Across five cardiac and musculoskeletal datasets with point landmarks and segmentation masks, our method learns coherent canonical templates and enables accurate atlas-space annotation transfer. On EchoNet-Dynamic and MSK-Bone, it supports single- and few-shot transfer with accuracy competitive with strong dense-correspondence baselines, while training in minutes on a single consumer GPU. The learned embeddings are interpretable: linear projections reveal structured cohort variation, image-decoder interpolation produces anatomically plausible intermediate frames, and test-time latent inversion reconstructs held-out frames through the atlas. These results suggest that cohort-scale neural atlases offer a practical, interpretable representation for reducing expert annotation burden in ultrasound video analysis.

Problem

Research questions and friction points this paper is trying to address.

ultrasound video

annotation bottleneck

dynamic clinical information

image appearance variability

expert labeling cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

neural atlas

ultrasound video

cohort-scale learning