🤖 AI Summary
To address persistent hallucinations, inaccurate knowledge retrieval, and inadequate data privacy protection in large language models (LLMs) for financial Q&A targeting non-expert users, this paper proposes a localized Retrieval-Augmented Generation (RAG) system tailored for Chinese financial question answering. Methodologically, it introduces a novel cascaded reranking framework integrating BGE-M3 dense retrieval with BGE-reranker, and constructs a domain-specific, trustworthy knowledge base by fusing Chinese Wikipedia and Lawbank. The system adopts a lightweight, on-premises deployment architecture to ensure data privacy and operational autonomy. Evaluated on the TTQA and TMMLU+ benchmarks—via a two-stage assessment comprising automated evaluation and a human study involving 20 participants—the system demonstrates substantial hallucination mitigation: automated accuracy improves markedly, while user average answer correctness increases by 32.5% in the human study, confirming its effectiveness and practical utility.
📝 Abstract
This study develops a question-answering system based on Retrieval-Augmented Generation (RAG) using Chinese Wikipedia and Lawbank as retrieval sources. Using TTQA and TMMLU+ as evaluation datasets, the system employs BGE-M3 for dense vector retrieval to obtain highly relevant search results and BGE-reranker to reorder these results based on query relevance. The most pertinent retrieval outcomes serve as reference knowledge for a Large Language Model (LLM), enhancing its ability to answer questions and establishing a knowledge retrieval system grounded in generative AI. The system's effectiveness is assessed through a two-stage evaluation: automatic and assisted performance evaluations. The automatic evaluation calculates accuracy by comparing the model's auto-generated labels with ground truth answers, measuring performance under standardized conditions without human intervention. The assisted performance evaluation involves 20 finance-related multiple-choice questions answered by 20 participants without financial backgrounds. Initially, participants answer independently. Later, they receive system-generated reference information to assist in answering, examining whether the system improves accuracy when assistance is provided. The main contributions of this research are: (1) Enhanced LLM Capability: By integrating BGE-M3 and BGE-reranker, the system retrieves and reorders highly relevant results, reduces hallucinations, and dynamically accesses authorized or public knowledge sources. (2) Improved Data Privacy: A customized RAG architecture enables local operation of the LLM, eliminating the need to send private data to external servers. This approach enhances data security, reduces reliance on commercial services, lowers operational costs, and mitigates privacy risks.