🤖 AI Summary
This work addresses the challenge of bias in financial language models, which hinders their real-world deployment, and the high computational cost of existing bias detection methods that impedes integration into continuous training pipelines. The authors propose a cross-model guided bias detection framework, demonstrating that five financial language models exhibit highly consistent bias-inducing samples across protected attributes. Leveraging this consistency, they show for the first time that the outputs of a single lightweight model—such as DistilRoBERTa—can efficiently guide bias detection in other models. Using over 125k original–variant pairs derived from 17k real financial news articles, and combining input perturbation, atomic and intersectional attribute evaluation, and guided sampling, the method reveals 73% of FinMA’s biased behaviors using only 20% of the test samples, substantially reducing detection costs.
📝 Abstract
Bias in financial language models constitutes a major obstacle to their adoption in real-world applications. Detecting such bias is challenging, as it requires identifying inputs whose predictions change when varying properties unrelated to the decision, such as demographic attributes. Existing approaches typically rely on exhaustive mutation and pairwise prediction analysis over large corpora, which is effective but computationally expensive-particularly for large language models and can become impractical in continuous retraining and releasing processes. Aiming at reducing this cost, we conduct a large-scale study of bias in five financial language models, examining similarities in their bias tendencies across protected attributes and exploring cross-model-guided bias detection to identify bias-revealing inputs earlier. Our study uses approximately 17k real financial news sentences, mutated to construct over 125k original-mutant pairs. Results show that all models exhibit bias under both atomic (0.58\%-6.05\%) and intersectional (0.75\%-5.97\%) settings. Moreover, we observe consistent patterns in bias-revealing inputs across models, enabling substantial reuse and cost reduction in bias detection. For example, up to 73\% of FinMA's biased behaviours can be uncovered using only 20\% of the input pairs when guided by properties derived from DistilRoBERTa outputs.