🤖 AI Summary
This paper addresses Class-Imbalanced Federated Unsupervised Domain Adaptation (CI-FFDA), where both source and target domains exhibit class imbalance, the target domain lacks labels, and clients suffer from cross-client non-IID data distributions coupled with label shift. We propose a novel framework leveraging frozen large-scale pre-trained vision foundation models (e.g., ViT) as fixed feature extractors. Crucially, we eliminate fine-tuning and complex domain-alignment modules—enabling efficient knowledge transfer without accessing source labels or updating backbone parameters. The method integrates federated learning with unsupervised domain adaptation under strict privacy and communication constraints. Experiments demonstrate significant mitigation of both inter-domain distribution shift and class imbalance, yielding substantial improvements in overall accuracy. Moreover, the approach exhibits strong generalization and robustness across multiple non-IID target clients. To our knowledge, this is the first work to directly employ frozen vision foundation models for CI-FFDA, markedly reducing communication and computational overhead while maintaining high adaptability.
📝 Abstract
Federated Learning (FL) offers a framework for training models collaboratively while preserving data privacy of each client. Recently, research has focused on Federated Source-Free Domain Adaptation (FFREEDA), a more realistic scenario wherein client-held target domain data remains unlabeled, and the server can access source domain data only during pre-training. We extend this framework to a more complex and realistic setting: Class Imbalanced FFREEDA (CI-FFREEDA), which takes into account class imbalances in both the source and target domains, as well as label shifts between source and target and among target clients. The replication of existing methods in our experimental setup lead us to rethink the focus from enhancing aggregation and domain adaptation methods to improving the feature extractors within the network itself. We propose replacing the FFREEDA backbone with a frozen vision foundation model (VFM), thereby improving overall accuracy without extensive parameter tuning and reducing computational and communication costs in federated learning. Our experimental results demonstrate that VFMs effectively mitigate the effects of domain gaps, class imbalances, and even non-IID-ness among target clients, suggesting that strong feature extractors, not complex adaptation or FL methods, are key to success in the real-world FL.