🤖 AI Summary
Braille information processing faces two major challenges: data scarcity and ambiguity in mixed-text content (particularly mathematical notation). To address these, we propose BKFT, a braille-knowledge-driven instruction-tuning framework. BKFT introduces the first bilingual (English–Chinese) braille dataset featuring mixed text—including LaTeX-encoded mathematical formulas—and incorporates a syntax-tree structural enhancement strategy to improve data efficiency in low-resource settings. It unifies braille translation, mathematical formula-to-braille conversion, and mixed-text braille rendering into a single multi-task instruction-learning paradigm, explicitly injecting braille grammar and encoding rules (e.g., UEB and CBE standards) into model training. Experiments demonstrate that BKFT significantly outperforms conventional fine-tuning baselines on braille translation tasks. This work marks the first successful application of instruction tuning for multi-task braille language modeling. We publicly release both the dataset and code, establishing critical infrastructure for low-resource, multilingual braille research.
📝 Abstract
Braille plays a vital role in education and information accessibility for visually impaired individuals. However, Braille information processing faces challenges such as data scarcity and ambiguities in mixed-text contexts. We construct English and Chinese Braille Mixed Datasets (EBMD/CBMD) with mathematical formulas to support diverse Braille domain research, and propose a syntax tree-based augmentation method tailored for Braille data. To address the underperformance of traditional fine-tuning methods in Braille-related tasks, we investigate Braille Knowledge-Based Fine-Tuning (BKFT), which reduces the learning difficulty of Braille contextual features. BrailleLLM employs BKFT via instruction tuning to achieve unified Braille translation, formula-to-Braille conversion, and mixed-text translation. Experiments demonstrate that BKFT achieves significant performance improvements over conventional fine-tuning in Braille translation scenarios. Our open-sourced datasets and methodologies establish a foundation for low-resource multilingual Braille research.