Feature-to-Image Data Augmentation: Improving Model Feature Extraction with Cluster-Guided Synthetic Samples

📅 2024-09-26

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address poor generalization and overfitting in few-shot learning, this paper proposes FICAug, a feature-space clustering-guided feature-to-image data augmentation framework. FICAug introduces a novel two-stage paradigm: (1) intra-class k-means clustering in the ResNet-18 feature space followed by Gaussian sampling within each cluster, and (2) generative projection of synthesized features back into the image domain via a generator network, ensuring semantic consistency and visual diversity. By augmenting limited labeled data with high-fidelity, class-discriminative synthetic images, FICAug significantly enhances both feature discriminability and reconstruction quality. Evaluated on standard few-shot benchmarks, FICAug boosts ResNet-18 classification accuracy to 88.63%, outperforming baseline methods substantially; cross-validated feature-space classification accuracy reaches 84.09%, demonstrating robust representation learning under data scarcity.

Technology Category

Application Category

📝 Abstract

One of the growing trends in machine learning is the use of data generation techniques, since the performance of machine learning models is dependent on the quantity of the training dataset. However, in many real-world applications, particularly in medical and low-resource domains, collecting large datasets is challenging due to resource constraints, which leads to overfitting and poor generalization. This study introduces FICAug, a novel feature-to-image data augmentation framework designed to improve model generalization under limited data conditions by generating structured synthetic samples. FICAug first operates in the feature space, where original data are clustered using the k-means algorithm. Within pure-label clusters, synthetic data are generated through Gaussian sampling to increase diversity while maintaining label consistency. These synthetic features are then projected back into the image domain using a generative neural network, and a convolutional neural network is trained on the reconstructed images to learn enhanced representations. Experimental results demonstrate that FICAug significantly improves classification accuracy. In feature space, it achieved a cross-validation accuracy of 84.09%, while training a ResNet-18 model on the reconstructed images further boosted performance to 88.63%, illustrating the effectiveness of the proposed framework in extracting new and task-relevant features.

Problem

Research questions and friction points this paper is trying to address.

Enhancing model generalization with limited training data

Generating synthetic samples via cluster-guided feature augmentation

Improving feature extraction accuracy in low-resource domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cluster-guided synthetic samples generation

Feature-to-image projection using GAN

Enhanced CNN training on reconstructed images

🔎 Similar Papers

No similar papers found.