🤖 AI Summary
This work addresses the challenge of low fidelity and texture distortion in synthetic breast ultrasound images generated for data augmentation. To overcome these limitations, the authors propose a hybrid diffusion model based on Stable Diffusion that integrates text-to-image generation with image-to-image refinement. The approach incorporates Low-Rank Adaptation (LoRA) and Textual Inversion (TI) techniques to enhance visual realism while preserving class consistency. Evaluated on the BUSI dataset, the method significantly improves generation quality, reducing the Fréchet Inception Distance (FID) from 45.97 to 33.29, while maintaining strong performance in downstream classification tasks.
📝 Abstract
We propose a hybrid diffusion-based augmentation framework to overcome the critical challenge of ultrasound data augmentation in breast ultrasound (BUS) datasets. Unlike conventional diffusion-based augmentations, our approach improves visual fidelity and preserves ultrasound texture by combining text-to-image generation with image-to-image (img2img) refinement, as well as fine-tuning with low-rank adaptation (LoRA) and textual inversion (TI). Our method generated realistic, class-consistent images on an open-source Kaggle breast ultrasound image dataset (BUSI). Compared to the Stable Diffusion v1.5 baseline, incorporating TI and img2img refinement reduced the Frechet Inception Distance (FID) from 45.97 to 33.29, demonstrating a substantial gain in fidelity while maintaining comparable downstream classification performance. Overall, the proposed framework effectively mitigates the low-fidelity limitations of synthetic ultrasound images and enhances the quality of augmentation for robust diagnostic modeling.