🤖 AI Summary
This paper addresses the machine learning–based prediction of credit spreads and credit ratings. We construct a comprehensive 167-dimensional credit risk framework incorporating macroeconomic, financial, bond-specific, and—novelly—30 proprietary non-financial firm-level indicators. Seven algorithms—including random forests and gradient boosting—are employed, augmented by feature importance analysis and mechanism testing, to develop an implied-rating model grounded in predicted credit spreads. Our key contributions are threefold: (i) we systematically introduce, for the first time, a large-scale set of non-financial indicators, which occupy seven of the top ten most important features, substantially enhancing explanatory power; (ii) incorporating these non-financial variables improves out-of-sample predictive performance by over 100%; and (iii) the implied-rating model achieves accuracy, recall, and F1-score all exceeding 75%, and demonstrates superior explanatory power for credit spreads relative to leading domestic rating agencies.
📝 Abstract
We build a 167-indicator comprehensive credit risk indicator set, integrating macro, corporate financial, bond-specific indicators, and for the first time, 30 large-scale corporate non-financial indicators. We use seven machine learning models to construct a bond credit spread prediction model, test their spread predictive power and economic mechanisms, and verify their credit rating prediction effectiveness. Results show these models outperform Chinese credit rating agencies in explaining credit spreads. Specially, adding non-financial indicators more than doubles their out-of-sample performance vs. traditional feature-driven models. Mechanism analysis finds non-financial indicators far more important than traditional ones (macro-level, financial, bond features)-seven of the top 10 are non-financial (e.g., corporate governance, property rights nature, information disclosure evaluation), the most stable predictors. Models identify high-risk traits (deteriorating operations, short-term debt, higher financing constraints) via these indicators for spread prediction and risk identification. Finally, we pioneer a credit rating model using predicted spreads (predicted implied rating model), with full/sub-industry models achieving over 75% accuracy, recall, F1. This paper provides valuable guidance for bond default early warning, credit rating, and financial stability.