Synthetic Data Augmentation for Multi-Task Chinese Porcelain Classification: A Stable Diffusion Approach

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of multi-task classification—spanning dynasty, glaze color, kiln site, and vessel type—for rare categories of Chinese ancient ceramics, where scarcity of authentic samples severely limits model performance. To overcome this data paucity, the authors propose a novel approach that leverages Stable Diffusion fine-tuned with LoRA to synthesize high-fidelity archaeological images, which are then mixed with real samples in calibrated proportions to train a MobileNetV3-based multi-task CNN. Experimental results demonstrate consistent improvements across tasks: F1-macro scores increase by 5.5% for vessel type classification and by 3–4% for dynasty and kiln site identification. The findings not only confirm the differential benefits of synthetic data across distinct classification tasks but also highlight the necessity of aligning generated features with task-relevant visual cues, thereby establishing a new paradigm and practical guidelines for data augmentation in archaeological AI.

Technology Category

Application Category

📝 Abstract

The scarcity of training data presents a fundamental challenge in applying deep learning to archaeological artifact classification, particularly for the rare types of Chinese porcelain. This study investigates whether synthetic images generated through Stable Diffusion with Low-Rank Adaptation (LoRA) can effectively augment limited real datasets for multi-task CNN-based porcelain classification. Using MobileNetV3 with transfer learning, we conducted controlled experiments comparing models trained on pure real data against those trained on mixed real-synthetic datasets (95:5 and 90:10 ratios) across four classification tasks: dynasty, glaze, kiln and type identification. Results demonstrate task-specific benefits: type classification showed the most substantial improvement (5.5\% F1-macro increase with 90:10 ratio), while dynasty and kiln tasks exhibited modest gains (3-4\%), suggesting that synthetic augmentation effectiveness depends on the alignment between generated features and task-relevant visual signatures. Our work contributes practical guidelines for deploying generative AI in archaeological research, demonstrating both the potential and limitations of synthetic data when archaeological authenticity must be balanced with data diversity.

Problem

Research questions and friction points this paper is trying to address.

data scarcity

Chinese porcelain classification

synthetic data augmentation

multi-task learning

archaeological artifact classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic Data Augmentation

Stable Diffusion

LoRA

Multi-Task Classification

Chinese Porcelain

🔎 Similar Papers

Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data

2024-03-05arXiv.orgCitations: 2

Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

2024-06-20Neural Information Processing SystemsCitations: 0

Authors to Follow