🤖 AI Summary
This work addresses the challenge posed by artificial annotations—such as calipers and textual overlays—in clinical ultrasound images, which often induce shortcut learning biases that hinder accurate recognition of underlying anatomical structures. To mitigate this issue, the authors propose Echo-DM, a novel framework that, for the first time, integrates a conditional latent diffusion model with a region-aware fusion mechanism. This approach enables end-to-end removal of annotations while preserving anatomical fidelity, without requiring mask inputs. Echo-DM is compatible with various latent representation modules, including VAEs and RAEs, and demonstrates superior performance over existing two-stage methods on the large-scale paired dataset Echo-PAIR. The method achieves an excellent balance among annotation removal quality, structural preservation, and deployment efficiency.
📝 Abstract
Clinical ultrasound images often contain artificial markers, such as measurement calipers and text, to assist diagnostic interpretation and comparison. However, these markers can introduce shortcut bias in downstream automated analysis, encouraging deep learning models to rely on marker-related cues rather than clinically meaningful anatomy. Existing marker removal methods are either mask-dependent and vulnerable to error propagation, or mask-free deterministic restorers that may over-smooth ultrasound texture and perturb unaffected background regions. To address these challenges, we present Echo-DM, a framework for ultrasound marker removal via conditional latent diffusion and region-aware fusion. Echo-DM follows a common encoder-diffusion-decoder pipeline, where a DiT-based conditional latent diffusion network performs global restoration and a region-aware fusion module enforces preservation-aware image-space refinement under end-to-end mask-free inference. Building on this fixed core design, we further instantiate Echo-DM-V and Echo-DM-R with VAE-based and RAE-based latent modules, respectively, which demonstrates that the Echo-DM architecture is compatible with diverse latent-module instantiations. Extensive experiments on Echo-PAIR, a large-scale paired clinical ultrasound dataset, demonstrate superior marker removal and strong anatomical fidelity compared with representative two-stage baselines, while providing favorable quality--efficiency trade-offs across deployment settings. Data, code and models will be released at https://github.com/MiliLab/Echo-DM.