🤖 AI Summary
This study addresses the challenge of achieving interpretable cross-sectional return prediction in the large-scale A-share market while quantifying the contributions of diverse factors. To this end, we develop an interpretable machine learning framework that integrates XGBoost with TreeSHAP, uniquely combining SHAP attribution and ablation analysis to uncover substitutability structures among features and precisely measure the relative importance of behavioral signals versus valuation factors. Empirical results demonstrate strong out-of-sample performance, with a monthly average AUC of 0.547 and a long–short portfolio delivering an average monthly excess return of 2.38% (annualized Sharpe ratio of 2.23). Notably, behavioral signals account for 58.2% of feature contributions on average, substantially exceeding the 10.7% attributed to valuation factors.
📝 Abstract
We present an interpretable machine learning pipeline to decompose Cross-Sectional Equity Return Predictability into auditable factor contribution. We apply an XGBoost model with TreeSHAP attribution and conduct stress testing on 3632 Chinese A-share stocks from 2009 until 2019. Using 60-month, rolling windows over 55 months of out-of-sample data, XGBoost obtains a mean AUC of 0.547 and +2.38%/month (Newey-West t = 5.94; Annualized Sharpe 2.23) long-short spread for the top vs bottom quintiles. This alpha is persistent after adjusting for the Carhart four-factor model (+2.31%/month; t = 7.48). SHAP Decomposition indicates that behavioral signals (turnover and momentum) account for 58.2% of predictive attribution compared to 10.7% for valuation ratios, on average, across 55 industry groups. Ablation analysis serves to cross-validate this ranking and provides evidence that SHAP and ablation diverge in a manner that highlights feature substitutability structure that is largely invisible to either method used in isolation.