ReforMe: Re-Shaping Documents with Contextual Prompting and Layout-Aware Propagation

πŸ“… 2026-06-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

178K/year
πŸ€– AI Summary
This work addresses the challenge of accurately and efficiently digitizing complex documents containing handwritten content, irregular tables, and heterogeneous layoutsβ€”tasks that remain difficult for conventional OCR systems and current large language models. The authors propose an interactive document digitization system that integrates layout-aware parsing, OCR, and a large language model, enhanced by a user-in-the-loop correction propagation mechanism. Leveraging layout-aware inference, the system automatically generalizes user edits or natural language instructions applied to a local region to structurally similar regions across the document. In a user study (n=12), this approach significantly improved correction efficiency, reduced repetitive manual operations, and enabled more controllable and effective reconstruction of document structure and content.
πŸ“ Abstract
Digitizing complex documents with handwritten content, irregular tables, and heterogeneous layouts remains challenging, as traditional Optical Character Recognition (OCR) systems fail to capture writing nuances, author-specific conventions, and document structure, and recent LLM-based approaches lack mechanisms for precise, scalable correction. We present an interactive document digitization system that integrates layout-aware parsing, OCR, and LLM-based reconstruction with user-driven refinement. The system is informed by a formative study that identifies key challenges and interaction needs in real-world digitization workflows. It supports both direct edits and natural-language instructions, and introduces a layout-aware propagation mechanism that generalizes user corrections across structurally similar regions. This enables not only efficient error correction but also document re-shaping into structured, analyzable representations. We evaluate the system through a within-subjects user study (n=12) on real-world documents. Results show improved correction efficiency and reduced repetitive effort, demonstrating more effective and controllable document digitization procedure.
Problem

Research questions and friction points this paper is trying to address.

document digitization
handwritten content
irregular tables
heterogeneous layouts
OCR limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

layout-aware propagation
interactive document digitization
LLM-based reconstruction
user-driven refinement
structured document representation
πŸ”Ž Similar Papers