DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work proposes DeepRead, a structure-aware multi-turn document reasoning agent that addresses the limitation of existing retrieval methods which treat long documents as flat text and ignore their inherent hierarchical organization and discourse order, thereby constraining complex question-answering capabilities. DeepRead is the first to explicitly integrate a document’s native hierarchical structure into the retrieval process: it leverages an LLM-driven OCR pipeline to produce structured Markdown, constructs a paragraph-level coordinate index, and introduces two synergistic tools—structure-aware retrieval (Retrieve) and sequential, section-preserving reading (ReadSection)—to emulate human-like “locate-and-read” reasoning. Experimental results demonstrate that DeepRead significantly outperforms strong baselines such as Search-o1 on long-document QA tasks, and behavioral analysis confirms the effectiveness of its tool coordination and structure-guided reasoning mechanism.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of tool-use capabilities in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) is shifting from static, one-shot retrieval toward autonomous, multi-turn evidence acquisition. However, existing agentic search frameworks typically treat long documents as flat collections of unstructured chunks, disregarding the native hierarchical organization and sequential logic essential for human comprehension. To bridge this gap, we introduce \textbf{DeepRead}, a structure-aware document reasoning agent designed to operationalize document-native structural priors into actionable reasoning capabilities. Leveraging the structural fidelity of modern OCR, DeepRead constructs a paragraph-level, coordinate-based navigation system and equips the LLM with two synergistic tools: \textsf{Retrieve} for scanning-aware localization, and \textsf{ReadSection} for contiguous, order-preserving reading within specific hierarchical scopes. This design elicits a human-like ``locate-then-read''reasoning paradigm, effectively mitigating the context fragmentation inherent in traditional retrieval methods. Extensive evaluations across four benchmarks spanning diverse document types demonstrate that DeepRead outperforms Search-o1-style agentic search baselines by an average of 10.3\%. Fine-grained behavioral analysis further confirms that DeepRead autonomously adopts human-aligned reading strategies, validating the critical role of structural awareness in achieving precise document reasoning. Our code is available at https://github.com/Zhanli-Li/DeepRead.

Problem

Research questions and friction points this paper is trying to address.

agentic search

document structure

long-document question answering

retrieval-augmented generation

hierarchical organization

Innovation

Methods, ideas, or system contributions that make the work stand out.

structure-aware reasoning

agentic search

document hierarchy