🤖 AI Summary
Conventional meta-federated learning (Meta-FL) suffers from suboptimal personalization under high data heterogeneity, as it requires multiple local fine-tuning steps yet optimizes only single-step adaptation performance.
Method: We propose a generalized Meta-FL framework that redefines the meta-objective as minimizing the average loss after ν-step fine-tuning—enabling flexible, multi-step personalized adaptation. Our approach formulates a bi-level optimization problem and introduces an enhanced FedAvg variant tailored to this objective.
Contribution/Results: We provide the first rigorous convergence analysis for this framework under non-convex settings, covering both exact and approximate gradient scenarios. Experiments on real-world datasets demonstrate significant accuracy improvements, 37% faster convergence than baseline methods, and strong robustness to varying degrees of data heterogeneity.
📝 Abstract
Meta federated learning (FL) is a personalized variant of FL, where multiple agents collaborate on training an initial shared model without exchanging raw data samples. The initial model should be trained in a way that current or new agents can easily adapt it to their local datasets after one or a few fine-tuning steps, thus improving the model personalization. Conventional meta FL approaches minimize the average loss of agents on the local models obtained after one step of fine-tuning. In practice, agents may need to apply several fine-tuning steps to adapt the global model to their local data, especially under highly heterogeneous data distributions across agents. To this end, we present a generalized framework for the meta FL by minimizing the average loss of agents on their local model after any arbitrary number $
u$ of fine-tuning steps. For this generalized framework, we present a variant of the well-known federated averaging (FedAvg) algorithm and conduct a comprehensive theoretical convergence analysis to characterize the convergence speed as well as behavior of the meta loss functions in both the exact and approximated cases. Our experiments on real-world datasets demonstrate superior accuracy and faster convergence for the proposed scheme compared to conventional approaches.