Enhancing Small-Scale Dataset Expansion with Triplet-Connection-based Sample Re-Weighting

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

To address performance degradation in generative data augmentation under few-shot settings caused by image noise, this paper proposes TriReWeight—a triplet-relation-based sample reweighting method. It theoretically models the structured relationships among real samples, generated samples, and labels, analyzes three types of supervision signals, and adaptively assigns weights to enhance noise robustness without modifying the underlying generative model. Designed for plug-and-play integration, TriReWeight is compatible with arbitrary generative augmentation frameworks and achieves a generalization error convergence rate approaching the optimal bound. Extensive experiments on six natural-image and three medical imaging benchmarks demonstrate that TriReWeight consistently outperforms state-of-the-art methods by 7.9% (natural) and 3.4% (medical) on average, significantly improving the generalization performance of diverse generative augmentation strategies.

Technology Category

Application Category

📝 Abstract

The performance of computer vision models in certain real-world applications, such as medical diagnosis, is often limited by the scarcity of available images. Expanding datasets using pre-trained generative models is an effective solution. However, due to the uncontrollable generation process and the ambiguity of natural language, noisy images may be generated. Re-weighting is an effective way to address this issue by assigning low weights to such noisy images. We first theoretically analyze three types of supervision for the generated images. Based on the theoretical analysis, we develop TriReWeight, a triplet-connection-based sample re-weighting method to enhance generative data augmentation. Theoretically, TriReWeight can be integrated with any generative data augmentation methods and never downgrade their performance. Moreover, its generalization approaches the optimal in the order $O(sqrt{dln (n)/n})$. Our experiments validate the correctness of the theoretical analysis and demonstrate that our method outperforms the existing SOTA methods by $7.9%$ on average over six natural image datasets and by $3.4%$ on average over three medical datasets. We also experimentally validate that our method can enhance the performance of different generative data augmentation methods.

Problem

Research questions and friction points this paper is trying to address.

Address noisy images in generative data augmentation

Improve small-scale dataset expansion with re-weighting

Enhance performance of computer vision models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Triplet-connection-based sample re-weighting method

Integration with any generative augmentation methods

Optimal generalization with theoretical guarantees

🔎 Similar Papers

Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods