🤖 AI Summary
Low efficiency and poor generalizability in parsing unstructured financial earnings reports hinder automated financial analysis.
Method: This paper proposes a dual-agent large language model framework tailored for finance: an Extraction Agent that automatically identifies, standardizes, and validates KPIs; and a Text-to-SQL Agent enabling schema-agnostic natural language ad-hoc querying. We introduce the first collaborative multi-agent architecture that decouples structured information extraction from semantic querying, integrating prompt engineering, structured pipelines, and human-in-the-loop verification.
Contribution/Results: Evaluated on real-world financial reports, our approach achieves 95% KPI extraction accuracy—on par with human experts—and 91% correctness in NL2SQL response generation, with robust cross-document generalization. It significantly advances end-to-end automation, scalability, and practicality of financial report structuring and analysis, overcoming key performance and deployment limitations of traditional rule-based systems and fine-tuned models.
📝 Abstract
Extracting structured and quantitative insights from unstructured financial filings is essential in investment research, yet remains time-consuming and resource-intensive. Conventional approaches in practice rely heavily on labor-intensive manual processes, limiting scalability and delaying the research workflow. In this paper, we propose an efficient and scalable method for accurately extracting quantitative insights from unstructured financial documents, leveraging a multi-agent system composed of large language models. Our proposed multi-agent system consists of two specialized agents: the emph{Extraction Agent} and the emph{Text-to-SQL Agent}. The extit{Extraction Agent} automatically identifies key performance indicators from unstructured financial text, standardizes their formats, and verifies their accuracy. On the other hand, the extit{Text-to-SQL Agent} generates executable SQL statements from natural language queries, allowing users to access structured data accurately without requiring familiarity with the database schema. Through experiments, we demonstrate that our proposed system effectively transforms unstructured text into structured data accurately and enables precise retrieval of key information. First, we demonstrate that our system achieves approximately 95% accuracy in transforming financial filings into structured data, matching the performance level typically attained by human annotators. Second, in a human evaluation of the retrieval task -- where natural language queries are used to search information from structured data -- 91% of the responses were rated as correct by human evaluators. In both evaluations, our system generalizes well across financial document types, consistently delivering reliable performance.