🤖 AI Summary
Mainstream large language models (LLMs) exhibit significant deficiencies in understanding and correcting Chinese cultural knowledge—such as classical poetry, idioms (chengyu), and proverbs—due to the lack of culturally grounded evaluation benchmarks. Method: We introduce CEval-Chinese, the first knowledge editing evaluation dataset tailored to Chinese linguistic and cultural characteristics. It covers seven categories of cultural knowledge and systematically models Chinese-specific challenges—including antithetical parallelism, polyphonic characters, and nested logical structures—using multi-source data from classical texts and online communities, rigorously validated by human annotators and domain experts. The dataset comprises high-quality samples with three distinct error types. Contribution/Results: Experiments reveal systematic semantic-level failures of current LLMs on cultural knowledge and expose critical performance bottlenecks of mainstream knowledge editing methods in Chinese contexts. CEval-Chinese fills a key gap in Chinese knowledge editing benchmarks and establishes a new paradigm for evaluating and enhancing culture-aware language capabilities.
📝 Abstract
Chinese, as a linguistic system rich in depth and complexity, is characterized by distinctive elements such as ancient poetry, proverbs, idioms, and other cultural constructs. However, current Large Language Models (LLMs) face limitations in these specialized domains, highlighting the need for the development of comprehensive datasets that can assess, continuously update, and progressively improve these culturally-grounded linguistic competencies through targeted training optimizations. To address this gap, we introduce CKnowEdit, the first-ever Chinese knowledge editing dataset designed to correct linguistic, factual, and logical errors in LLMs. We collect seven types of knowledge from a wide range of sources, including classical texts, idioms, and content from Baidu Tieba Ruozhiba, taking into account the unique polyphony, antithesis, and logical structures inherent in the Chinese language. By analyzing this dataset, we highlight the challenges current LLMs face in mastering Chinese. Furthermore, our evaluation of state-of-the-art knowledge editing techniques reveals opportunities to advance the correction of Chinese knowledge. Code and dataset are available at https://github.com/zjunlp/EasyEdit.