InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models

📅 2025-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit insufficient domain-specific understanding and fine-grained reasoning capabilities in the Chinese insurance domain. Method: We introduce InsQABench—the first multi-task question-answering benchmark tailored to this domain—covering three realistic scenarios: commonsense knowledge, structured databases, and unstructured documents. We propose a hierarchical QA evaluation framework and design a dual-path协同 reasoning method—SQL-ReAct for structured query execution and RAG-ReAct for unstructured retrieval-augmented reasoning—unifying modeling of both paradigms. Domain-adaptive fine-tuning and multi-source insurance data integration further enhance model performance. Contribution/Results: Our approach significantly improves LLM accuracy in insurance terminology comprehension and policy clause reasoning. All benchmark data, implementation code, and evaluation protocols are publicly released, establishing foundational infrastructure for standardized AI deployment in the insurance industry.

Technology Category

Application Category

📝 Abstract
The application of large language models (LLMs) has achieved remarkable success in various fields, but their effectiveness in specialized domains like the Chinese insurance industry remains underexplored. The complexity of insurance knowledge, encompassing specialized terminology and diverse data types, poses significant challenges for both models and users. To address this, we introduce InsQABench, a benchmark dataset for the Chinese insurance sector, structured into three categories: Insurance Commonsense Knowledge, Insurance Structured Database, and Insurance Unstructured Documents, reflecting real-world insurance question-answering tasks.We also propose two methods, SQL-ReAct and RAG-ReAct, to tackle challenges in structured and unstructured data tasks. Evaluations show that while LLMs struggle with domain-specific terminology and nuanced clause texts, fine-tuning on InsQABench significantly improves performance. Our benchmark establishes a solid foundation for advancing LLM applications in the insurance domain, with data and code available at https://github.com/HaileyFamo/InsQABench.git.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Insurance Industry
Complex Knowledge Processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

InsQABench
SQL-ReAct
RAG-ReAct
🔎 Similar Papers
No similar papers found.
J
Jing Ding
Huazhong University of Science and Technology, Wuhan, China
Kai Feng
Kai Feng
Northwestern Polytechnical University
Computational imagingspectral imagingdeep learning
B
Binbin Lin
Huazhong University of Science and Technology, Wuhan, China
Jiarui Cai
Jiarui Cai
AWS AI
Q
Qiushi Wang
Fudan University, Shanghai, China
Y
Yu Xie
Purple Mountain Laboratories, Nanjing, China
X
Xiaojin Zhang
Huazhong University of Science and Technology, Wuhan, China
Z
Zhongyu Wei
Fudan University, Shanghai, China
W
Wei Chen
Huazhong University of Science and Technology, Wuhan, China