Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe model bias and insufficient personalization in federated learning under non-IID data distributions, this paper proposes a client clustering framework grounded in the Population Stability Index (PSI). We introduce a novel weighted PSI metric (WPSI^L), which more accurately quantifies data distribution shifts than conventional measures such as the Hellinger distance. Coupled with the silhouette coefficient, our method adaptively determines the optimal number of clusters, enabling lightweight, interpretable, and distribution-aware client grouping. Building upon this clustering, we design a personalized federated learning (PFL) paradigm that facilitates intra-cluster collaborative optimization and inter-cluster knowledge transfer. Extensive evaluation across six diverse tabular, image, and text datasets demonstrates that our approach improves global accuracy by up to 18% and enhances client fairness by 37% relatively under extreme label skew.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices. However, non-independent and identically distributed (non-IID) data across clients biases updates and degrades performance. To alleviate these issues, we propose Clust-PSI-PFL, a clustering-based personalized FL framework that uses the Population Stability Index (PSI) to quantify the level of non-IID data. We compute a weighted PSI metric, $WPSI^L$, which we show to be more informative than common non-IID metrics (Hellinger, Jensen-Shannon, and Earth Mover's distance). Using PSI features, we form distributionally homogeneous groups of clients via K-means++; the number of optimal clusters is chosen by a systematic silhouette-based procedure, typically yielding few clusters with modest overhead. Across six datasets (tabular, image, and text modalities), two partition protocols (Dirichlet with parameter $α$ and Similarity with parameter S), and multiple client sizes, Clust-PSI-PFL delivers up to 18% higher global accuracy than state-of-the-art baselines and markedly improves client fairness by a relative improvement of 37% under severe non-IID data. These results establish PSI-guided clustering as a principled, lightweight mechanism for robust PFL under label skew.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in federated learning due to non-IID data
Proposes a clustering framework using Population Stability Index to quantify data skew
Improves global accuracy and client fairness under severe non-IID conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

PSI quantifies non-IID data for clustering
Weighted PSI metric outperforms common distance measures
Silhouette-based clustering improves accuracy and fairness
🔎 Similar Papers
No similar papers found.