🤖 AI Summary
This study addresses the significant degradation in recognition accuracy of Kuzushiji (classical Japanese cursive script) characters caused by stamp interference. To tackle this challenge, the authors propose a three-stage restoration-guided recognition framework: first, character detection is performed using YOLOv12-medium, achieving a precision of 98.0% and recall of 93.3%; second, image inpainting techniques are applied to remove stamp artifacts; and third, a Vision Transformer (Metom) carries out character classification. This work introduces, for the first time, a restoration-guided mechanism specifically designed to handle stamp occlusions, and establishes the first benchmark dataset dedicated to Kuzushiji detection and classification. Experimental results demonstrate that the restoration stage effectively improves Top-1 classification accuracy from 93.45% to 95.33%, with restoration quality quantitatively evaluated using PSNR and SSIM metrics.
📝 Abstract
Kuzushiji was one of the most popular writing styles in pre-modern Japan and was widely used in both personal letters and official documents. However, due to its highly cursive forms and extensive glyph variations, most modern Japanese readers cannot directly interpret Kuzushiji characters. Therefore, recent research has focused on developing automated Kuzushiji character recognition methods, which have achieved satisfactory performance on relatively clean Kuzushiji document images. However, existing methods struggle to maintain recognition accuracy under seal interference (e.g., when seals overlap characters), despite the frequent occurrence of seals in pre-modern Japanese documents. To address this challenge, we propose a three-stage restoration-guided Kuzushiji character recognition (RG-KCR) framework specifically designed to mitigate seal interference. We construct datasets for evaluating Kuzushiji character detection (Stage 1) and classification (Stage 3). Experimental results show that the YOLOv12-medium model achieves a precision of 98.0% and a recall of 93.3% on the constructed test set. We quantitatively evaluate the restoration performance of Stage 2 using PSNR and SSIM. In addition, we conduct an ablation study to demonstrate that Stage 2 improves the Top-1 accuracy of Metom, a Vision Transformer (ViT)-based Kuzushiji classifier employed in Stage 3, from 93.45% to 95.33%. The implementation code of this work is available at https://ruiyangju.github.io/RG-KCR.