🤖 AI Summary
China’s intangible cultural heritage (ICH) faces severe challenges—including transmission discontinuity and skill attrition—amid rapid modernization. Existing large language models (LLMs) lack domain-specific adaptation for ICH, limiting their applicability in digital humanities and heritage preservation. To address this, we introduce the first Chinese LLM dedicated to Chinese ICH: built upon the Qwen architecture, it integrates domain-specific pretraining on ICH corpora, synthetic data augmentation tailored to ICH knowledge, supervised fine-tuning, and explicit knowledge alignment. This model achieves the first systematic deep semantic modeling of ICH within LLMs. Empirical evaluation demonstrates substantial improvements over general-purpose baselines across key tasks—including ICH question answering, generative description of traditional craftsmanship, and simulated dialogues with heritage bearers. The work provides a deployable, scalable technical framework and methodological paradigm for intelligent ICH preservation and digital humanities research.
📝 Abstract
The intangible cultural heritage (ICH) of China, a cultural asset transmitted across generations by various ethnic groups, serves as a significant testament to the evolution of human civilization and holds irreplaceable value for the preservation of historical lineage and the enhancement of cultural self-confidence. However, the rapid pace of modernization poses formidable challenges to ICH, including threats damage, disappearance and discontinuity of inheritance. China has the highest number of items on the UNESCO Intangible Cultural Heritage List, which is indicative of the nation's abundant cultural resources and emphasises the pressing need for ICH preservation. In recent years, the rapid advancements in large language modelling have provided a novel technological approach for the preservation and dissemination of ICH. This study utilises a substantial corpus of open-source Chinese ICH data to develop a large language model, ICH-Qwen, for the ICH domain. The model employs natural language understanding and knowledge reasoning capabilities of large language models, augmented with synthetic data and fine-tuning techniques. The experimental results demonstrate the efficacy of ICH-Qwen in executing tasks specific to the ICH domain. It is anticipated that the model will provide intelligent solutions for the protection, inheritance and dissemination of intangible cultural heritage, as well as new theoretical and practical references for the sustainable development of intangible cultural heritage. Furthermore, it is expected that the study will open up new paths for digital humanities research.