Synthetic Data Augmentation for Multi-Task Chinese Porcelain Classification: A Stable Diffusion Approach

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of multi-task classification—spanning dynasty, glaze color, kiln site, and vessel type—for rare categories of Chinese ancient ceramics, where scarcity of authentic samples severely limits model performance. To overcome this data paucity, the authors propose a novel approach that leverages Stable Diffusion fine-tuned with LoRA to synthesize high-fidelity archaeological images, which are then mixed with real samples in calibrated proportions to train a MobileNetV3-based multi-task CNN. Experimental results demonstrate consistent improvements across tasks: F1-macro scores increase by 5.5% for vessel type classification and by 3–4% for dynasty and kiln site identification. The findings not only confirm the differential benefits of synthetic data across distinct classification tasks but also highlight the necessity of aligning generated features with task-relevant visual cues, thereby establishing a new paradigm and practical guidelines for data augmentation in archaeological AI.

Technology Category

Application Category

📝 Abstract
The scarcity of training data presents a fundamental challenge in applying deep learning to archaeological artifact classification, particularly for the rare types of Chinese porcelain. This study investigates whether synthetic images generated through Stable Diffusion with Low-Rank Adaptation (LoRA) can effectively augment limited real datasets for multi-task CNN-based porcelain classification. Using MobileNetV3 with transfer learning, we conducted controlled experiments comparing models trained on pure real data against those trained on mixed real-synthetic datasets (95:5 and 90:10 ratios) across four classification tasks: dynasty, glaze, kiln and type identification. Results demonstrate task-specific benefits: type classification showed the most substantial improvement (5.5\% F1-macro increase with 90:10 ratio), while dynasty and kiln tasks exhibited modest gains (3-4\%), suggesting that synthetic augmentation effectiveness depends on the alignment between generated features and task-relevant visual signatures. Our work contributes practical guidelines for deploying generative AI in archaeological research, demonstrating both the potential and limitations of synthetic data when archaeological authenticity must be balanced with data diversity.
Problem

Research questions and friction points this paper is trying to address.

data scarcity
Chinese porcelain classification
synthetic data augmentation
multi-task learning
archaeological artifact classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic Data Augmentation
Stable Diffusion
LoRA
Multi-Task Classification
Chinese Porcelain
Z
Ziyao Ling
Department of Computer Science and Engineering, University of Bologna, Mura Anteo Zamboni 7, Bologna, 40127, Emilia-Romagna, Italy
S
S. Mirri
Department of Computer Science and Engineering, University of Bologna, Mura Anteo Zamboni 7, Bologna, 40127, Emilia-Romagna, Italy
Paola Salomoni
Paola Salomoni
Department of Computer Science and Engineering, Università di Bologna
Giovanni Delnevo
Giovanni Delnevo
PhD Student on Data Science and Computation, University of Bologna
Human Machine InterfaceMachine Learning