NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG approaches for fine-grained question answering over academic PDFs suffer from a disconnection between neural and symbolic retrieval, and fail to leverage document structure due to single-view, layout-agnostic text chunking. Method: We propose a collaborative multi-view RAG framework featuring: (1) a neural-symbolic dual-path retrieval mechanism enabling dynamic complementarity between semantic and exact matching; (2) a schema-driven, multi-view PDF parsing pipeline—extracting structured content across chapters, tables, and equations—to jointly generate relational databases and vector indices; and (3) an LLM agent-guided iterative context collection strategy. Results: Evaluated on three full-PDF QA benchmarks—including AIRQA-REAL—our method significantly outperforms pure vector-based RAG and diverse structured baselines, achieving +12.6% absolute improvement in answer accuracy and +37.4% gain in structural awareness.

Technology Category

Application Category

📝 Abstract
The increasing number of academic papers poses significant challenges for researchers to efficiently acquire key details. While retrieval augmented generation (RAG) shows great promise in large language model (LLM) based automated question answering, previous works often isolate neural and symbolic retrieval despite their complementary strengths. Moreover, conventional single-view chunking neglects the rich structure and layout of PDFs, e.g., sections and tables. In this work, we propose NeuSym-RAG, a hybrid neural symbolic retrieval framework which combines both paradigms in an interactive process. By leveraging multi-view chunking and schema-based parsing, NeuSym-RAG organizes semi-structured PDF content into both the relational database and vectorstore, enabling LLM agents to iteratively gather context until sufficient to generate answers. Experiments on three full PDF-based QA datasets, including a self-annotated one AIRQA-REAL, show that NeuSym-RAG stably defeats both the vector-based RAG and various structured baselines, highlighting its capacity to unify both retrieval schemes and utilize multiple views. Code and data are publicly available at https://github.com/X-LANCE/NeuSym-RAG.
Problem

Research questions and friction points this paper is trying to address.

Combines neural and symbolic retrieval for PDF QA
Addresses single-view chunking neglect in PDF structure
Enhances retrieval with multi-view and schema-based parsing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid neural symbolic retrieval framework
Multi-view chunking for PDF structuring
Schema-based parsing for relational organization
Ruisheng Cao
Ruisheng Cao
Shanghai Jiao Tong University
LLM Agenttext-to-SQLcode generationsemantic parsingdialogue systems
H
Hanchong Zhang
MoE Key Lab of Artificial Intelligence, Shanghai, China; X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China
Tiancheng Huang
Tiancheng Huang
Nanyang Technological University
Deep LearningGraph Neural NetworkLiDAR3D Point Cloud
Z
Zhangyi Kang
MoE Key Lab of Artificial Intelligence, Shanghai, China; X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China
Y
Yuxin Zhang
MoE Key Lab of Artificial Intelligence, Shanghai, China; X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China
Liangtai Sun
Liangtai Sun
Master, Shanghai Jiao Tong University
NLPGUI understandingMulti-modal
H
Hanqi Li
MoE Key Lab of Artificial Intelligence, Shanghai, China; X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China
Y
Yuxun Miao
MoE Key Lab of Artificial Intelligence, Shanghai, China; X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China
Shuai Fan
Shuai Fan
AISpeech Co., Ltd., Suzhou, China
L
Lu Chen
MoE Key Lab of Artificial Intelligence, Shanghai, China; X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China
K
Kai Yu
MoE Key Lab of Artificial Intelligence, Shanghai, China; X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China