π€ AI Summary
This work addresses the challenge of low fidelity and texture distortion in synthetic breast ultrasound images generated for data augmentation. To overcome these limitations, the authors propose a hybrid diffusion model based on Stable Diffusion that integrates text-to-image generation with image-to-image refinement. The approach incorporates Low-Rank Adaptation (LoRA) and Textual Inversion (TI) techniques to enhance visual realism while preserving class consistency. Evaluated on the BUSI dataset, the method significantly improves generation quality, reducing the FrΓ©chet Inception Distance (FID) from 45.97 to 33.29, while maintaining strong performance in downstream classification tasks.
π Abstract
We propose a hybrid diffusion-based augmentation framework to overcome the critical challenge of ultrasound data augmentation in breast ultrasound (BUS) datasets. Unlike conventional diffusion-based augmentations, our approach improves visual fidelity and preserves ultrasound texture by combining text-to-image generation with image-to-image (img2img) refinement, as well as fine-tuning with low-rank adaptation (LoRA) and textual inversion (TI). Our method generated realistic, class-consistent images on an open-source Kaggle breast ultrasound image dataset (BUSI). Compared to the Stable Diffusion v1.5 baseline, incorporating TI and img2img refinement reduced the Frechet Inception Distance (FID) from 45.97 to 33.29, demonstrating a substantial gain in fidelity while maintaining comparable downstream classification performance. Overall, the proposed framework effectively mitigates the low-fidelity limitations of synthetic ultrasound images and enhances the quality of augmentation for robust diagnostic modeling.