GeoRAG: A Question-Answering Approach from a Geographical Perspective

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional geographic question-answering (GeoQA) systems suffer from limitations in semantic understanding, retrieval accuracy, interactive capability, and handling of complex spatial reasoning tasks. To address these challenges, we propose GeoRAG—the first knowledge-enhanced Retrieval-Augmented Generation (RAG) framework specifically designed for geographic domains. Our method integrates a BERT-Base-Chinese–based multi-label query type classifier, a structured knowledge base built via multi-agent collaboration (containing 145K geographic entities and 875K high-quality QA pairs), and a RAG architecture augmented with domain-specific components. Key contributions include: (1) a novel seven-dimensional geographic knowledge taxonomy; (2) a multi-label query classification model; (3) a QA-pair–driven retrieval evaluation mechanism; and (4) a dimension-aware dynamic GeoPrompt template. Extensive experiments demonstrate that GeoRAG consistently outperforms conventional RAG baselines across multiple foundation models, exhibiting strong generalizability and practical utility in real-world GeoQA applications.

Technology Category

Application Category

📝 Abstract
Geographic Question Answering (GeoQA) addresses natural language queries in geographic domains to fulfill complex user demands and improve information retrieval efficiency. Traditional QA systems, however, suffer from limited comprehension, low retrieval accuracy, weak interactivity, and inadequate handling of complex tasks, hindering precise information acquisition. This study presents GeoRAG, a knowledge-enhanced QA framework integrating domain-specific fine-tuning and prompt engineering with Retrieval-Augmented Generation (RAG) technology to enhance geographic knowledge retrieval accuracy and user interaction. The methodology involves four components: (1) A structured geographic knowledge base constructed from 3267 corpora (research papers, monographs, and technical reports), categorized via a multi-agent approach into seven dimensions: semantic understanding, spatial location, geometric morphology, attribute characteristics, feature relationships, evolutionary processes, and operational mechanisms. This yielded 145234 classified entries and 875432 multi-dimensional QA pairs. (2) A multi-label text classifier based on BERT-Base-Chinese, trained to analyze query types through geographic dimension classification. (3) A retrieval evaluator leveraging QA pair data to assess query-document relevance, optimizing retrieval precision. (4) GeoPrompt templates engineered to dynamically integrate user queries with retrieved information, enhancing response quality through dimension-specific prompting. Comparative experiments demonstrate GeoRAG's superior performance over conventional RAG across multiple base models, validating its generalizability. This work advances geographic AI by proposing a novel paradigm for deploying large language models in domain-specific contexts, with implications for improving GeoQA systems scalability and accuracy in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Enhances geographic knowledge retrieval accuracy using RAG technology
Improves user interaction in geographic question answering systems
Addresses limitations of traditional QA systems in complex geographic tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates domain-specific fine-tuning with RAG technology
Constructs structured geographic knowledge base from 3267 corpora
Uses multi-label classifier and retrieval evaluator for precision
🔎 Similar Papers
No similar papers found.
J
Jian Wang
Zhuo Zhao
Zhuo Zhao
Z
Zheng Jie Wang
B
Bo Da Cheng
L
Lei Nie
Wen Luo
Wen Luo
Peking University
Z
Zhao Yuan Yu
L
Ling Wang Yuan