Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the susceptibility of large language models to hallucinations in financial question answering due to numerical reasoning errors. To mitigate this, the authors propose a Data-Centric Reasoning Compilation (DCRC) framework that transforms user queries and retrieved documents into verifiable executable programs through adversarial data synthesis, multi-stage training, and a compile-and-execute inference mechanism. By innovatively integrating evidence auditing with program synthesis, DCRC systematically alleviates three key challenges: noise sensitivity, computational fragility, and lack of auditability. Experimental results demonstrate that DCRC achieves strong performance on offline benchmarks and has been successfully deployed in a real-world online financial QA system, significantly enhancing the accuracy and trustworthiness of numerical reasoning.

📝 Abstract

Large Language Models (LLMs) have significantly advanced online data services, particularly in the domain of financial question answering (FinQA). However, such systems remain susceptible to numerical reasoning hallucinations, which critically undermine reliability in high-stakes financial applications. Although retrieval-augmented generation (RAG) has been widely adopted to ground responses in external knowledge, it introduces three persistent challenges: noise sensitivity, calculation fragility, and an auditability crisis. Existing model-centric approaches, which primarily focus on optimizing either the retriever or generator in isolation, still struggle to address these issues in an integrated manner. In this work, we pioneer a data-centric paradigm and propose a novel framework, the Data-centric Reasoning Compiler (DCRC). The framework operates through three cohesive phases: (1) adversarial data construction, which synthesizes training examples with controlled noise to teach robustness; (2) multi-stage training that cultivates a Data-centric Structuring Agent (DSA) capable of explicit evidence auditing and program synthesis; and (3) a compile-and-execute inference process, where the DSA transforms user queries and retrieved documents into verifiable, executable reasoning programs. This data-driven framework ensures faithful numerical reasoning by design. We conduct extensive experiments on established offline benchmarks and further validate our framework through deployment in a real-world online financial QA system.

Problem

Research questions and friction points this paper is trying to address.

numerical hallucinations

financial question answering

retrieval-augmented generation

noise sensitivity

auditability

Innovation

Methods, ideas, or system contributions that make the work stand out.

data-centric compilation

numerical hallucination

financial QA