PSI-PFL: Population Stability Index for Client Selection in non-IID Personalized Federated Learning

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address model bias and performance degradation in federated learning (FL) under non-IID settings—caused by skewed data distributions across clients—this paper proposes a client selection method based on the Population Stability Index (PSI), the first application of PSI in FL. Our approach establishes a statistically stable, interpretable, and lightweight heterogeneity-aware sampling mechanism that dynamically selects clients with more consistent data distributions for training. By mitigating label skew effects, it jointly optimizes global accuracy and local fairness while preserving privacy. Extensive experiments across tabular, image, and text modalities under diverse non-IID configurations demonstrate that our method achieves up to a 10% improvement in global accuracy over state-of-the-art baselines, alongside a significant reduction in client-level performance variance. These results validate the generalizability and robustness of the proposed framework.

Technology Category

Application Category

📝 Abstract

Federated Learning (FL) enables decentralized machine learning (ML) model training while preserving data privacy by keeping data localized across clients. However, non-independent and identically distributed (non-IID) data across clients poses a significant challenge, leading to skewed model updates and performance degradation. Addressing this, we propose PSI-PFL, a novel client selection framework for Personalized Federated Learning (PFL) that leverages the Population Stability Index (PSI) to quantify and mitigate data heterogeneity (so-called non-IIDness). Our approach selects more homogeneous clients based on PSI, reducing the impact of label skew, one of the most detrimental factors in FL performance. Experimental results over multiple data modalities (tabular, image, text) demonstrate that PSI-PFL significantly improves global model accuracy, outperforming state-of-the-art baselines by up to 10% under non-IID scenarios while ensuring fairer local performance. PSI-PFL enhances FL performance and offers practical benefits in applications where data privacy and heterogeneity are critical.

Problem

Research questions and friction points this paper is trying to address.

Addressing non-IID data challenges in Federated Learning

Mitigating label skew impact via PSI-based client selection

Improving global model accuracy in heterogeneous data scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

PSI-PFL uses Population Stability Index for client selection

Mitigates non-IID data impact via homogeneous client selection

Improves global model accuracy by up to 10%

🔎 Similar Papers

No similar papers found.