InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

📅 2025-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The absence of a unified foundation model hinders holistic understanding of full-page online handwritten notes. Method: We propose the first vision-language foundation model for multilingual handwritten documents, integrating multilingual OCR, mathematical formula recognition, and page-structure parsing (text/drawing segmentation) within a single homogeneous architecture. Our approach unifies document image encoding, serialized layout modeling, and script-adaptive decoding heads, supporting zero-shot text-line segmentation and LoRA-based fine-tuning for cross-task generalization. Results: Our model achieves superior zero-shot text-line segmentation performance over baselines such as docTR. With lightweight fine-tuning, it attains state-of-the-art results across five major handwritten datasets—DeepWriting, CASIA, SCUT, Mathwriting, and QuickDraw—demonstrating comprehensive support for 28 scripts, mathematical formula recognition, and page-element parsing.

Technology Category

Application Category

📝 Abstract
Tablets and styluses are increasingly popular for taking notes. To optimize this experience and ensure a smooth and efficient workflow, it's important to develop methods for accurately interpreting and understanding the content of handwritten digital notes. We introduce a foundational model called InkFM for analyzing full pages of handwritten content. Trained on a diverse mixture of tasks, this model offers a unique combination of capabilities: recognizing text in 28 different scripts, mathematical expressions recognition, and segmenting pages into distinct elements like text and drawings. Our results demonstrate that these tasks can be effectively unified within a single model, achieving SoTA text line segmentation out-of-the-box quality surpassing public baselines like docTR. Fine- or LoRA-tuning our base model on public datasets further improves the quality of page segmentation, achieves state-of the art text recognition (DeepWriting, CASIA, SCUT, and Mathwriting datasets) and sketch classification (QuickDraw). This adaptability of InkFM provides a powerful starting point for developing applications with handwritten input.
Problem

Research questions and friction points this paper is trying to address.

Develops InkFM for full-page handwritten note analysis
Unifies text, math, and drawing recognition in one model
Achieves state-of-the-art performance across multiple datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified model for multi-task handwritten analysis
Supports 28 scripts and math expressions recognition
Achieves SoTA in segmentation and text recognition
🔎 Similar Papers
No similar papers found.