🤖 AI Summary
This work addresses the challenge of recovering structured Markdown from document page images while preserving both textual content and layout structure. To this end, the authors propose ParseFixer, a novel framework that first leverages MinerU2.5 Pro to generate an initial Markdown representation and then employs an agent-based selective multimodal refinement mechanism. This mechanism utilizes multimodal verification and a rollback strategy to accurately identify and correct high-value structural errors while retaining reliable parsing results. Evaluated on the DataMFM Challenge Track 1 test set, ParseFixer achieves a composite score of 61.78, ranking third and demonstrating its effectiveness and innovation in structural document recovery.
📝 Abstract
In this report, we present our third-place solution for the DataMFM Challenge Track 1: Document Parsing. This track requires models to recover structured Markdown documents from document page images while preserving textual content and document structure. To address the complementary requirements of accurate content recovery and faithful structure reconstruction, we propose ParseFixer, an agentic framework for backbone parsing and selective correction. ParseFixer consists of two key modules: Full-Page Backbone Parsing (FBP) and Agentic Selective Correction (ASC). FBP produces stable initial Markdown outputs with MinerU2.5 Pro, while ASC detects high-value parsing failures and repairs them through a verify-and-rollback correction process. By placing selective multimodal correction after open-source backbone parsing, ParseFixer improves the recovery of key document elements without rewriting reliable backbone predictions. On the test set, our final system achieves an overall score of 61.78 and ranks third in Track 1, demonstrating its effectiveness for accurate document parsing. Our code will be released at: https://github.com/iLearn-Lab/CVPRW26-ParseFixer.