ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
China’s intangible cultural heritage (ICH) faces severe challenges—including transmission discontinuity and skill attrition—amid rapid modernization. Existing large language models (LLMs) lack domain-specific adaptation for ICH, limiting their applicability in digital humanities and heritage preservation. To address this, we introduce the first Chinese LLM dedicated to Chinese ICH: built upon the Qwen architecture, it integrates domain-specific pretraining on ICH corpora, synthetic data augmentation tailored to ICH knowledge, supervised fine-tuning, and explicit knowledge alignment. This model achieves the first systematic deep semantic modeling of ICH within LLMs. Empirical evaluation demonstrates substantial improvements over general-purpose baselines across key tasks—including ICH question answering, generative description of traditional craftsmanship, and simulated dialogues with heritage bearers. The work provides a deployable, scalable technical framework and methodological paradigm for intelligent ICH preservation and digital humanities research.

Technology Category

Application Category

📝 Abstract
The intangible cultural heritage (ICH) of China, a cultural asset transmitted across generations by various ethnic groups, serves as a significant testament to the evolution of human civilization and holds irreplaceable value for the preservation of historical lineage and the enhancement of cultural self-confidence. However, the rapid pace of modernization poses formidable challenges to ICH, including threats damage, disappearance and discontinuity of inheritance. China has the highest number of items on the UNESCO Intangible Cultural Heritage List, which is indicative of the nation's abundant cultural resources and emphasises the pressing need for ICH preservation. In recent years, the rapid advancements in large language modelling have provided a novel technological approach for the preservation and dissemination of ICH. This study utilises a substantial corpus of open-source Chinese ICH data to develop a large language model, ICH-Qwen, for the ICH domain. The model employs natural language understanding and knowledge reasoning capabilities of large language models, augmented with synthetic data and fine-tuning techniques. The experimental results demonstrate the efficacy of ICH-Qwen in executing tasks specific to the ICH domain. It is anticipated that the model will provide intelligent solutions for the protection, inheritance and dissemination of intangible cultural heritage, as well as new theoretical and practical references for the sustainable development of intangible cultural heritage. Furthermore, it is expected that the study will open up new paths for digital humanities research.
Problem

Research questions and friction points this paper is trying to address.

Preserving Chinese intangible cultural heritage (ICH) from threats and disappearance
Developing a language model (ICH-Qwen) for ICH domain tasks
Providing intelligent solutions for ICH protection and digital humanities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes open-source Chinese ICH data corpus
Employs natural language understanding and reasoning
Augments with synthetic data and fine-tuning
🔎 Similar Papers
No similar papers found.
W
Wenhao Ye
Nanjing Agricultural University, Nanjing, Jiangsu, China
T
Tiansheng Zheng
Nanjing Agricultural University, Nanjing, Jiangsu, China
Yue Qi
Yue Qi
Beihang University
W
Wenhua Zhao
Nanjing Agricultural University, Nanjing, Jiangsu, China
X
Xiyu Wang
Nanjing Agricultural University, Nanjing, Jiangsu, China
X
Xue Zhao
Nanjing Agricultural University, Nanjing, Jiangsu, China
J
Jiacheng He
Nanjing Agricultural University, Nanjing, Jiangsu, China
Y
Yaya Zheng
Nanjing Institute of Technology, Nanjing, Jiangsu, China
D
Dongbo Wang
Nanjing Agricultural University, Nanjing, Jiangsu, China