DIAGen: Diverse Image Augmentation with Generative Models

📅 2024-08-26

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing image augmentation methods struggle to modify high-level semantic attributes—such as viewpoint, environment, weather, and intra-class fine-grained features—thereby limiting model generalization. Generative approaches (e.g., DA-Fusion) remain largely confined to texture-level manipulation and lack deep semantic diversity. To address this, we propose a diffusion-based, semantics-aware diversification framework. Our method innovatively integrates Textual Inversion embedding perturbation, LLM-driven dynamic prompt generation, and a generation-quality weighting mechanism, enabling controllable semantic editing across viewpoint, environment, weather, and intra-class attributes. Extensive experiments on multiple benchmarks demonstrate substantial improvements over standard augmentations and DA-Fusion—particularly in out-of-distribution classification accuracy and robustness. These results empirically validate the critical role of semantic-level augmentation in enhancing model generalization.

Technology Category

Application Category

📝 Abstract

Simple data augmentation techniques, such as rotations and flips, are widely used to enhance the generalization power of computer vision models. However, these techniques often fail to modify high-level semantic attributes of a class. To address this limitation, researchers have explored generative augmentation methods like the recently proposed DA-Fusion. Despite some progress, the variations are still largely limited to textural changes, thus falling short on aspects like varied viewpoints, environment, weather conditions, or even class-level semantic attributes (eg, variations in a dog's breed). To overcome this challenge, we propose DIAGen, building upon DA-Fusion. First, we apply Gaussian noise to the embeddings of an object learned with Textual Inversion to diversify generations using a pre-trained diffusion model's knowledge. Second, we exploit the general knowledge of a text-to-text generative model to guide the image generation of the diffusion model with varied class-specific prompts. Finally, we introduce a weighting mechanism to mitigate the impact of poorly generated samples. Experimental results across various datasets show that DIAGen not only enhances semantic diversity but also improves the performance of subsequent classifiers. The advantages of DIAGen over standard augmentations and the DA-Fusion baseline are particularly pronounced with out-of-distribution samples.

Problem

Research questions and friction points this paper is trying to address.

Enhance semantic diversity in few-shot learning image augmentation

Overcome limitations of texture-only changes in generative augmentation

Improve classifier performance with varied class-specific semantic attributes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian noise in embeddings for diversity

Guides diffusion model with varied prompts

Introduces weighting for poor sample mitigation

🔎 Similar Papers

Semantic Augmentation in Images using Language