🤖 AI Summary
This study addresses the long-standing challenge of Chinese zero pronoun (ZP) understanding in natural language processing, an area where the capabilities of current large language models (LLMs) remain unclear. The authors propose the first comprehensive multi-task evaluation framework that spans the full ZP comprehension pipeline, grounded in linguistic theory and encompassing five tasks: identification, referentiality classification, reference type classification, resolution, and translation. Through systematic evaluation of mainstream LLMs, the experiments reveal significant limitations: models perform poorly on upstream tasks such as ZP identification and referentiality judgment, and even state-of-the-art reasoning models achieve less than 50% accuracy on Chinese-to-English ZP translation. These findings underscore a critical gap in LLMs’ ability to capture the deep semantic structure underlying Chinese zero pronouns.
📝 Abstract
Zero Pronouns (ZPs) are a pervasive linguistic phenomenon in pro-drop languages such as Chinese and have long posed a challenge for natural language processing systems. Although Large Language Models (LLMs) perform well on many Chinese language tasks, their ability to process ZPs remains poorly understood. We conduct a systematic investigation of LLMs' handling of Chinese ZPs through a sequence of linguistically motivated tasks, including identification, referentiality classification, referential type classification, resolution, and translation. A diverse set of LLMs is evaluated across all tasks. Our results show that Chinese ZPs remain highly challenging for current LLMs, particularly for upstream tasks such as identification and referentiality classification. Performance on downstream tasks, such as ZP translation, is also consistently low: even state-of-the-art reasoning-oriented LLMs correctly translate fewer than half of Chinese ZPs into English.