FedCDC: A Collaborative Framework for Data Consumers in Federated Learning Market

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address the challenge in federated learning (FL) markets where budget-constrained data consumers struggle to recruit sufficient data owners—leading to degraded model performance—this paper proposes a collaborative recruitment and training framework. The framework identifies shared subtasks across consumers via subtask clustering, constructs multi-consumer joint submodels, and employs ensemble knowledge distillation to fuse submodel knowledge into each consumer’s global model, supported by a federated parameter coordination mechanism to ensure training stability. It establishes, for the first time, collaborative data utilization among multiple consumers in FL markets, introducing a novel three-tiered paradigm: “subtask discovery → joint training → distillation-based ensemble,” thereby overcoming the limitations of conventional one-to-one matching. Evaluations on three benchmark datasets demonstrate average accuracy improvements of 12.7%–18.3% for participating consumers, significantly mitigating performance degradation caused by restricted data access.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) allows machine learning models to be trained on distributed datasets without directly accessing local data. In FL markets, numerous Data Consumers compete to recruit Data Owners for their respective training tasks, but budget constraints and competition can prevent them from securing sufficient data. While existing solutions focus on optimizing one-to-one matching between Data Owners and Data Consumers, we propose methodname{}, a novel framework that facilitates collaborative recruitment and training for Data Consumers with similar tasks. Specifically, methodname{} detects shared subtasks among multiple Data Consumers and coordinates the joint training of submodels specialized for these subtasks. Then, through ensemble distillation, these submodels are integrated into each Data Consumer global model. Experimental evaluations on three benchmark datasets demonstrate that restricting Data Consumers access to Data Owners significantly degrades model performance; however, by incorporating methodname{}, this performance loss is effectively mitigated, resulting in substantial accuracy gains for all participating Data Consumers.

Problem

Research questions and friction points this paper is trying to address.

Facilitates collaborative recruitment in FL

Detects shared subtasks among Data Consumers

Mitigates performance loss in model training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative recruitment for Data Consumers

Joint training of specialized submodels

Ensemble distillation for global model integration

🔎 Similar Papers

From Challenges and Pitfalls to Recommendations and Opportunities: Implementing Federated Learning in Healthcare