🤖 AI Summary
Traditional journal entry testing (JETs) suffers from high false-positive rates and limited sensitivity to subtle financial fraud. Method: This paper proposes an AI-augmented double-entry bookkeeping audit paradigm, leveraging large language models (LLMs)—including LLaMA and Gemma—to directly model ledger semantics and logical accounting constraints for end-to-end anomaly detection on both real and synthetic anonymized ledger data. The approach integrates structured accounting rules with natural language reasoning capabilities, yielding not only anomaly classifications but also human-interpretable audit trails. Contribution/Results: Experiments demonstrate that LLMs substantially outperform JETs and classical machine learning baselines, reducing false-positive rates by 37%–52% while maintaining high recall. The method significantly enhances audit interpretability and human-AI collaboration efficiency. To our knowledge, this is the first systematic validation of LLMs’ effectiveness and practicality in structured financial auditing.
📝 Abstract
Auditors rely on Journal Entry Tests (JETs) to detect anomalies in tax-related ledger records, but rule-based methods generate overwhelming false positives and struggle with subtle irregularities. We investigate whether large language models (LLMs) can serve as anomaly detectors in double-entry bookkeeping. Benchmarking SoTA LLMs such as LLaMA and Gemma on both synthetic and real-world anonymized ledgers, we compare them against JETs and machine learning baselines. Our results show that LLMs consistently outperform traditional rule-based JETs and classical ML baselines, while also providing natural-language explanations that enhance interpretability. These results highlight the potential of extbf{AI-augmented auditing}, where human auditors collaborate with foundation models to strengthen financial integrity.