Transfer Learning for Loan Recovery Prediction under Distribution Shifts with Heterogeneous Feature Spaces

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of predicting loan recovery rates under conditions of scarce default data, covariate and conditional distribution shifts between source and target domains, and heterogeneous feature spaces. To tackle these issues, this work proposes the FT-MDN-Transformer model, which introduces distribution-aware transfer learning to recovery rate prediction in heterogeneous settings for the first time. The model integrates a Mixture Density Network (MDN) with a Tabular Transformer to jointly produce loan-level point estimates and portfolio-level probabilistic distributions. By explicitly accounting for cross-domain distributional discrepancies, the approach substantially outperforms existing baselines when target-domain data are limited, yielding probabilistic predictions that closely align with observed recovery distributions.

📝 Abstract

Accurate forecasting of recovery rates (RR) is central to credit risk management and regulatory capital determination. In many loan portfolios, however, RR modeling is constrained by data scarcity arising from infrequent default events. Transfer learning (TL) offers a promising avenue to mitigate this challenge by exploiting information from related but richer source domains, yet its effectiveness critically depends on the presence and strength of distributional shifts, and on potential heterogeneity between source and target feature spaces. This paper introduces FT-MDN-Transformer, a mixture-density tabular Transformer architecture specifically designed for TL in RR forecasting across heterogeneous feature sets. The model produces both loan-level point estimates and portfolio-level predictive distributions, thereby supporting a wide range of practical RR forecasting applications. We evaluate the proposed approach in a controlled Monte Carlo simulation that facilitates systematic variation of covariate, conditional, and label shifts, as well as in a real-world transfer setting using the Global Credit Data (GCD) loan dataset as source and a novel bonds dataset as target. Our results show that FT-MDN-Transformer outperforms baseline models when target-domain data are limited, with particularly pronounced gains under covariate and conditional shifts, while label shift remains challenging. We also observe its probabilistic forecasts to closely track empirical recovery distributions, providing richer information than conventional point-prediction metrics alone. Overall, the findings highlight the potential of distribution-aware TL architectures to improve RR forecasting in data-scarce credit portfolios and offer practical insights for risk managers operating under heterogeneous data environments.

Problem

Research questions and friction points this paper is trying to address.

loan recovery prediction

distribution shifts

heterogeneous feature spaces

data scarcity

transfer learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer Learning

Mixture Density Network

Tabular Transformer

Distribution Shift

Recovery Rate Prediction

🔎 Similar Papers

No similar papers found.

Authors to Follow