🤖 AI Summary
To address the challenge in federated learning (FL) markets where budget-constrained data consumers struggle to recruit sufficient data owners—leading to degraded model performance—this paper proposes a collaborative recruitment and training framework. The framework identifies shared subtasks across consumers via subtask clustering, constructs multi-consumer joint submodels, and employs ensemble knowledge distillation to fuse submodel knowledge into each consumer’s global model, supported by a federated parameter coordination mechanism to ensure training stability. It establishes, for the first time, collaborative data utilization among multiple consumers in FL markets, introducing a novel three-tiered paradigm: “subtask discovery → joint training → distillation-based ensemble,” thereby overcoming the limitations of conventional one-to-one matching. Evaluations on three benchmark datasets demonstrate average accuracy improvements of 12.7%–18.3% for participating consumers, significantly mitigating performance degradation caused by restricted data access.
📝 Abstract
Federated learning (FL) allows machine learning models to be trained on distributed datasets without directly accessing local data. In FL markets, numerous Data Consumers compete to recruit Data Owners for their respective training tasks, but budget constraints and competition can prevent them from securing sufficient data. While existing solutions focus on optimizing one-to-one matching between Data Owners and Data Consumers, we propose methodname{}, a novel framework that facilitates collaborative recruitment and training for Data Consumers with similar tasks. Specifically, methodname{} detects shared subtasks among multiple Data Consumers and coordinates the joint training of submodels specialized for these subtasks. Then, through ensemble distillation, these submodels are integrated into each Data Consumer global model. Experimental evaluations on three benchmark datasets demonstrate that restricting Data Consumers access to Data Owners significantly degrades model performance; however, by incorporating methodname{}, this performance loss is effectively mitigated, resulting in substantial accuracy gains for all participating Data Consumers.