🤖 AI Summary
Structural information extraction from scanned professional documents (e.g., geological and medical reports) faces challenges including high annotation costs, strong format heterogeneity, and poor generalizability. Method: This paper introduces the first “Human-in-the-Spiral” human-in-the-loop annotation paradigm, implemented as an integrated platform featuring document image preprocessing, a multimodal interactive annotation interface, an LLM-driven semantic alignment module, an incremental learning training engine, real-time evaluation dashboards, and RESTful model APIs—enabling an end-to-end annotation–training–feedback loop. Contribution/Results: Through three iterative spiral cycles, model performance improves continuously while per-document annotation time decreases by ≥41%. The open-source, freely available platform has been deployed in real-world geological and medical AI modeling tasks. Its core innovation lies in deeply embedding human cognitive feedback into the model evolution pipeline, substantially reducing annotation dependency and enhancing cross-domain generalization.
📝 Abstract
Acquiring structured data from domain-specific, image-based documents such as scanned reports is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform, designed to address the challenge of extracting structured information from domain-specific, image-based document collections. Our spiral design establishes an iterative cycle in which human annotations train models that progressively require less manual intervention. DocSpiral integrates document format normalization, comprehensive annotation interfaces, evaluation metrics dashboard, and API endpoints for the development of AI / ML models into a unified workflow. Experiments demonstrate that our framework reduces annotation time by at least 41% while showing consistent performance gains across three iterations during model training. By making this annotation platform freely accessible, we aim to lower barriers to AI/ML models development in document processing, facilitating the adoption of large language models in image-based, document-intensive fields such as geoscience and healthcare. The system is freely available at: https://app.ai4wa.com. The demonstration video is available: https://app.ai4wa.com/docs/docspiral/demo.