🤖 AI Summary
This work investigates the impact of data heterogeneity on generalization error in single-round federated learning. Addressing multi-client distributed learning, we extend the conditional mutual information (CMI) generalization analysis framework to arbitrary numbers of clients for the first time, and—leveraging rate-distortion theory—derive a “lossy” generalization upper bound that explicitly characterizes the dependence of generalization error on data distribution divergence (heterogeneity degree). Theoretically, we prove that, for models such as D-SVM, moderate data heterogeneity reduces both expected and tail generalization errors—challenging the conventional belief that heterogeneity inherently harms generalization. Extensive experiments under diverse non-i.i.d. data distributions corroborate the robustness of this phenomenon. Our key contributions are: (i) the first CMI-based generalization bound explicitly coupled with heterogeneity degree; and (ii) the revelation that data heterogeneity can act as a regularizer to improve generalization performance.
📝 Abstract
In this paper, we investigate the effect of data heterogeneity across clients on the performance of distributed learning systems, i.e., one-round Federated Learning, as measured by the associated generalization error. Specifically, (K) clients have each (n) training samples generated independently according to a possibly different data distribution and their individually chosen models are aggregated by a central server. We study the effect of the discrepancy between the clients' data distributions on the generalization error of the aggregated model. First, we establish in-expectation and tail upper bounds on the generalization error in terms of the distributions. In part, the bounds extend the popular Conditional Mutual Information (CMI) bound which was developed for the centralized learning setting, i.e., (K=1), to the distributed learning setting with arbitrary number of clients $K geq 1$. Then, we use a connection with information theoretic rate-distortion theory to derive possibly tighter extit{lossy} versions of these bounds. Next, we apply our lossy bounds to study the effect of data heterogeneity across clients on the generalization error for distributed classification problem in which each client uses Support Vector Machines (D-SVM). In this case, we establish explicit generalization error bounds which depend explicitly on the data heterogeneity degree. It is shown that the bound gets smaller as the degree of data heterogeneity across clients gets higher, thereby suggesting that D-SVM generalizes better when the dissimilarity between the clients' training samples is bigger. This finding, which goes beyond D-SVM, is validated experimentally through a number of experiments.