PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

173K/year
🤖 AI Summary
This work addresses the performance limitations of document parsing models in under-optimized regions—such as those exhibiting behavioral instability, data sparsity, or weak supervision—by proposing a region-aware data optimization framework coupled with a reinforcement learning–driven progressive post-training strategy. The approach identifies model-weak regions, enhances supervisory signals, and selectively curates high-quality training samples to enable targeted refinement of model deficiencies. Experimental results based on PaddleOCR-VL-1.5 demonstrate that the proposed method achieves a 96.33% accuracy on OmniDocBench v1.6, establishing a new state-of-the-art performance for vision-language document parsing.
📝 Abstract
We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.
Problem

Research questions and friction points this paper is trying to address.

document parsing
under-optimized regions
supervision reliability
model instability
data sparsity
Innovation

Methods, ideas, or system contributions that make the work stand out.

region-aware optimization
under-optimized region refinement
progressive post-training
reinforcement learning
document parsing
🔎 Similar Papers
2024-07-17European Conference on Computer VisionCitations: 2