🤖 AI Summary
To address the challenge of poor integration between model development and model risk management (MRM) in financial services, this paper proposes the first multi-agent collaborative framework jointly optimized for financial modeling and MRM. Methodologically, it orchestrates two specialized agent teams—leveraging large language models (LLMs) for task decomposition, domain-knowledge-enhanced prompting, structured documentation generation, and automated model validation—to enable end-to-end autonomous collaboration across data exploration, feature engineering, model training, regulatory compliance review, reproducibility verification, and conceptual soundness assessment. The key contribution lies in the first deep coupling of modeling and MRM within a unified multi-agent architecture, ensuring both model performance and stringent regulatory adherence. Experiments on credit card fraud detection, credit approval, and portfolio credit risk modeling demonstrate significant improvements in modeling efficiency, interpretability, and regulatory compliance.
📝 Abstract
The advent of large language models has ushered in a new era of agentic systems, where artificial intelligence programs exhibit remarkable autonomous decision-making capabilities across diverse domains. This paper explores agentic system workflows in the financial services industry. In particular, we build agentic crews that can effectively collaborate to perform complex modeling and model risk management (MRM) tasks. The modeling crew consists of a manager and multiple agents who perform specific tasks such as exploratory data analysis, feature engineering, model selection, hyperparameter tuning, model training, model evaluation, and writing documentation. The MRM crew consists of a manager along with specialized agents who perform tasks such as checking compliance of modeling documentation, model replication, conceptual soundness, analysis of outcomes, and writing documentation. We demonstrate the effectiveness and robustness of modeling and MRM crews by presenting a series of numerical examples applied to credit card fraud detection, credit card approval, and portfolio credit risk modeling datasets.