🤖 AI Summary
Federated foundation models for multimodal, multitask (M3T) learning in wireless edge/fog environments face dual heterogeneity—across modalities (e.g., sensors, images, text) and tasks (e.g., classification, detection, forecasting)—which hinders convergence and generalization. Method: We propose Hierarchical Federated Foundation Models (HF-FMs), integrating edge computing with device-to-device (D2D)/peer-to-peer (P2P) fog architecture. HF-FMs employ a modular M3T foundation model that jointly addresses modality and task heterogeneity via integrated components: modality-specific encoders, prompt learning, Mixture-of-Experts (MoE), lightweight adapters, and task-specific heads—enabling local collaborative training and cross-device relay. Contribution/Results: To our knowledge, this is the first end-to-end prototype validated on real-world wireless networks. HF-FMs significantly improve convergence speed and generalization under heterogeneity. We publicly release the code to advance research in edge-based multimodal federated learning.
📝 Abstract
The rise of foundation models (FMs) has reshaped the landscape of machine learning. As these models continued to grow, leveraging geo-distributed data from wireless devices has become increasingly critical, giving rise to federated foundation models (FFMs). More recently, FMs have evolved into multi-modal multi-task (M3T) FMs (e.g., GPT-4) capable of processing diverse modalities across multiple tasks, which motivates a new underexplored paradigm: M3T FFMs. In this paper, we unveil an unexplored variation of M3T FFMs by proposing hierarchical federated foundation models (HF-FMs), which in turn expose two overlooked heterogeneity dimensions to fog/edge networks that have a direct impact on these emerging models: (i) heterogeneity in collected modalities and (ii) heterogeneity in executed tasks across fog/edge nodes. HF-FMs strategically align the modular structure of M3T FMs, comprising modality encoders, prompts, mixture-of-experts (MoEs), adapters, and task heads, with the hierarchical nature of fog/edge infrastructures. Moreover, HF-FMs enable the optional usage of device-to-device (D2D) communications, enabling horizontal module relaying and localized cooperative training among nodes when feasible. Through delving into the architectural design of HF-FMs, we highlight their unique capabilities along with a series of tailored future research directions. Finally, to demonstrate their potential, we prototype HF-FMs in a wireless network setting and release the open-source code for the development of HF-FMs with the goal of fostering exploration in this untapped field (GitHub: https://github.com/payamsiabd/M3T-FFM).