Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the persistent “interpretation gap” in oracle bone script decipherment, stemming from the rarity of characters and their unique structural compositions, which hinder existing methods from effectively leveraging shared, semantically informative components. To bridge this gap, we propose a multimodal agent-based vision-language framework that, for the first time, integrates component-level semantic structures into oracle bone interpretation. Our approach combines fine-grained visual grounding, knowledge graph retrieval, and collaborative reasoning with large language models to construct interpretable and transferable interpretation chains. We introduce OB-Radix, a new expert-annotated dataset, and demonstrate significant improvements over current baselines across three benchmark tasks, achieving notable advances in both accuracy and descriptive richness, thereby substantially enhancing the capacity for automated oracle bone script decipherment.
📝 Abstract
Deciphering ancient Chinese Oracle Bone Script (OBS) is a challenging task that offers insights into the beliefs, systems, and culture of the ancient era. Existing approaches treat decipherment as a closed-set image recognition problem, which fails to bridge the ``interpretation gap'': while individual characters are often unique and rare, they are composed of a limited set of recurring, pictographic components that carry transferable semantic meanings. To leverage this structural logic, we propose an agent-driven Vision-Language Model (VLM) framework that integrates a VLM for precise visual grounding with an LLM-based agent to automate a reasoning chain of component identification, graph-based knowledge retrieval, and relationship inference for linguistically accurate interpretation. To support this, we also introduce OB-Radix, an expert-annotated dataset providing structural and semantic data absent from prior corpora, comprising 1,022 character images (934 unique characters) and 1,853 fine-grained component images across 478 distinct components with verified explanations. By evaluating our system across three benchmarks of different tasks, we demonstrate that our framework yields more detailed and precise decipherments compared to baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Oracle Bone Script
interpretation gap
component-based decipherment
multimodal knowledge
ancient script interpretation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Oracle Bone Script
Component-Grounded Reasoning
Vision-Language Model
Multimodal Knowledge Augmentation
OB-Radix Dataset
🔎 Similar Papers
No similar papers found.
Jianing Zhang
Jianing Zhang
Purdue University
Federated LearningMultiple Agent SystemsDifferential Privacy
R
Runan Li
College of Software, Jilin University
H
Honglin Pang
School of Artificial Intelligence, Jilin University
Ding Xia
Ding Xia
Doctoral Student, The University of Tokyo
computer science
Z
Zhou Zhu
School of Archaeology, Jilin University
Qian Zhang
Qian Zhang
Professor, Jilin University
finite element methodnumerical analysis
C
Chuntao Li
School of Archaeology, Jilin University; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MoE, China
Xi Yang
Xi Yang
Jilin University
computer graphicscomputer visiondeep learninguser interaction