🤖 AI Summary
This work addresses the challenge of unreliable uncertainty quantification in federated learning caused by dual heterogeneity—both in data distributions and model updates—which often leads to overconfident predictions and local failures at edge nodes. To tackle this, the authors propose FedWQ-CP, the first method within a federated conformal prediction framework that jointly handles both sources of heterogeneity. FedWQ-CP employs a single-round communication protocol to collaboratively calibrate local quantile thresholds from participating clients, which are then weighted and aggregated to construct a global threshold. This approach ensures valid empirical coverage at both global and local levels while significantly tightening prediction sets (for classification) or intervals (for regression). Experiments across seven public datasets demonstrate that FedWQ-CP achieves reliable coverage with high efficiency and minimal communication overhead.
📝 Abstract
Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk deploying overconfident models at under-resourced agents, leading to silent local failures despite seemingly satisfactory global performance. Existing federated UQ approaches often address data heterogeneity or model heterogeneity in isolation, overlooking their joint effect on coverage reliability across agents. Conformal prediction is a widely used distribution-free UQ framework, yet its applications in heterogeneous FL settings remains underexplored. We provide FedWQ-CP, a simple yet effective approach that balances empirical coverage performance with efficiency at both global and agent levels under the dual heterogeneity. FedWQ-CP performs agent-server calibration in a single communication round. On each agent, conformity scores are computed on calibration data and a local quantile threshold is derived. Each agent then transmits only its quantile threshold and calibration sample size to the server. The server simply aggregates these thresholds through a weighted average to produce a global threshold. Experimental results on seven public datasets for both classification and regression demonstrate that FedWQ-CP empirically maintains agent-wise and global coverage while producing the smallest prediction sets or intervals.