🤖 AI Summary
To address data scarcity, high annotation costs, and privacy constraints in few-shot image classification, this paper proposes HydraMix—a multi-image feature mixing architecture. Methodologically, HydraMix introduces a segmentation-guided masking mechanism to collaboratively fuse features from multiple same-class images in the latent space; integrates unsupervised reconstruction with adversarial discrimination to generate semantically consistent, high-fidelity synthetic samples; and devises the first text–image cross-modal generalizability metric to overcome semantic inconsistency limitations inherent in conventional augmentation methods. Evaluated on small-scale benchmarks—including ciFAIR-10/100 and STL-10—HydraMix achieves significant improvements over state-of-the-art approaches. It enables high-performance zero-shot training and strong generalization using only minimal labeled data, demonstrating superior robustness and scalability under extreme data-limited regimes.
📝 Abstract
Training deep neural networks requires datasets with a large number of annotated examples. The collection and annotation of these datasets is not only extremely expensive but also faces legal and privacy problems. These factors are a significant limitation for many real-world applications. To address this, we introduce HydraMix, a novel architecture that generates new image compositions by mixing multiple different images from the same class. HydraMix learns the fusion of the content of various images guided by a segmentation-based mixing mask in feature space and is optimized via a combination of unsupervised and adversarial training. Our data augmentation scheme allows the creation of models trained from scratch on very small datasets. We conduct extensive experiments on ciFAIR-10, STL-10, and ciFAIR-100. Additionally, we introduce a novel text-image metric to assess the generality of the augmented datasets. Our results show that HydraMix outperforms existing state-of-the-art methods for image classification on small datasets.