🤖 AI Summary
Low-resource languages encode vital cultural and historical knowledge yet suffer from data scarcity, inadequate model adaptation, and insufficient cultural sensitivity. To address these challenges, we propose the first large language model (LLM) application framework tailored for humanities research on low-resource languages. Our method integrates instruction fine-tuning, few-shot prompting, multilingual knowledge distillation, and cultural-context alignment, augmented by domain-specific knowledge graphs and sparse-label enhancement to enable culturally grounded fine-tuning and ethics-aware data governance. Experimental results demonstrate that our customized models achieve 32–57% accuracy improvements over baselines on three core digital humanities tasks: classical text transcription, endangered dialect analysis, and oral history structuring. As a community resource, we release LinguaHumanis v1.0—an open-source, task-diverse evaluation benchmark—providing both methodological foundations and practical implementation guidelines for low-resource language research in the digital humanities.
📝 Abstract
Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical challenges, including data scarcity and technological limitations, which hinder their comprehensive study and preservation. Recent advancements in large language models (LLMs) offer transformative opportunities for addressing these challenges, enabling innovative methodologies in linguistic, historical, and cultural research. This study systematically evaluates the applications of LLMs in low-resource language research, encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis. By analyzing technical frameworks, current methodologies, and ethical considerations, this paper identifies key challenges such as data accessibility, model adaptability, and cultural sensitivity. Given the cultural, historical, and linguistic richness inherent in low-resource languages, this work emphasizes interdisciplinary collaboration and the development of customized models as promising avenues for advancing research in this domain. By underscoring the potential of integrating artificial intelligence with the humanities to preserve and study humanity's linguistic and cultural heritage, this study fosters global efforts towards safeguarding intellectual diversity.