AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address context-length limitations and scarce annotated data in low-resource long-document Document Visual Question Answering (DocVQA), this paper proposes a unified adaptive framework. It integrates sparse-dense hybrid text retrieval for efficient key-paragraph localization; employs a multi-level verification mechanism for high-quality, automatic question-answer generation to enable robust data augmentation; and introduces adaptive ensemble inference with dynamic configuration generation and early-stopping strategies to enhance model robustness and generalization. Evaluated on the JDocQA benchmark, the framework achieves 83.04% accuracy on yes/no questions, 52.66% on factual questions, and 44.12% on numerical questions—surpassing prior methods. On the LAVA dataset, it attains 59.0%, establishing a new state-of-the-art for Japanese DocVQA.

Technology Category

Application Category

📝 Abstract

Document Visual Question Answering (Document VQA) faces significant challenges when processing long documents in low-resource environments due to context limitations and insufficient training data. This paper presents AdaDocVQA, a unified adaptive framework addressing these challenges through three core innovations: a hybrid text retrieval architecture for effective document segmentation, an intelligent data augmentation pipeline that automatically generates high-quality reasoning question-answer pairs with multi-level verification, and adaptive ensemble inference with dynamic configuration generation and early stopping mechanisms. Experiments on Japanese document VQA benchmarks demonstrate substantial improvements with 83.04% accuracy on Yes/No questions, 52.66% on factual questions, and 44.12% on numerical questions in JDocQA, and 59% accuracy on LAVA dataset. Ablation studies confirm meaningful contributions from each component, and our framework establishes new state-of-the-art results for Japanese document VQA while providing a scalable foundation for other low-resource languages and specialized domains. Our code available at: https://github.com/Haoxuanli-Thu/AdaDocVQA.

Problem

Research questions and friction points this paper is trying to address.

Adaptive framework for long document VQA in low-resource settings

Addresses context limitations and insufficient training data challenges

Improves performance on Japanese document VQA benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid text retrieval architecture for document segmentation

Intelligent data augmentation pipeline with multi-level verification

Adaptive ensemble inference with dynamic configuration mechanisms

🔎 Similar Papers

No similar papers found.

Authors to Follow