Hierarchical Federated Foundation Models over Wireless Networks for Multi-Modal Multi-Task Intelligence: Integration of Edge Learning with D2D/P2P-Enabled Fog Learning Architectures

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

Federated foundation models for multimodal, multitask (M3T) learning in wireless edge/fog environments face dual heterogeneity—across modalities (e.g., sensors, images, text) and tasks (e.g., classification, detection, forecasting)—which hinders convergence and generalization. Method: We propose Hierarchical Federated Foundation Models (HF-FMs), integrating edge computing with device-to-device (D2D)/peer-to-peer (P2P) fog architecture. HF-FMs employ a modular M3T foundation model that jointly addresses modality and task heterogeneity via integrated components: modality-specific encoders, prompt learning, Mixture-of-Experts (MoE), lightweight adapters, and task-specific heads—enabling local collaborative training and cross-device relay. Contribution/Results: To our knowledge, this is the first end-to-end prototype validated on real-world wireless networks. HF-FMs significantly improve convergence speed and generalization under heterogeneity. We publicly release the code to advance research in edge-based multimodal federated learning.

Technology Category

Application Category

📝 Abstract

The rise of foundation models (FMs) has reshaped the landscape of machine learning. As these models continued to grow, leveraging geo-distributed data from wireless devices has become increasingly critical, giving rise to federated foundation models (FFMs). More recently, FMs have evolved into multi-modal multi-task (M3T) FMs (e.g., GPT-4) capable of processing diverse modalities across multiple tasks, which motivates a new underexplored paradigm: M3T FFMs. In this paper, we unveil an unexplored variation of M3T FFMs by proposing hierarchical federated foundation models (HF-FMs), which in turn expose two overlooked heterogeneity dimensions to fog/edge networks that have a direct impact on these emerging models: (i) heterogeneity in collected modalities and (ii) heterogeneity in executed tasks across fog/edge nodes. HF-FMs strategically align the modular structure of M3T FMs, comprising modality encoders, prompts, mixture-of-experts (MoEs), adapters, and task heads, with the hierarchical nature of fog/edge infrastructures. Moreover, HF-FMs enable the optional usage of device-to-device (D2D) communications, enabling horizontal module relaying and localized cooperative training among nodes when feasible. Through delving into the architectural design of HF-FMs, we highlight their unique capabilities along with a series of tailored future research directions. Finally, to demonstrate their potential, we prototype HF-FMs in a wireless network setting and release the open-source code for the development of HF-FMs with the goal of fostering exploration in this untapped field (GitHub: https://github.com/payamsiabd/M3T-FFM).

Problem

Research questions and friction points this paper is trying to address.

Integrating hierarchical federated learning with wireless networks for multi-modal intelligence

Addressing heterogeneity in collected modalities and executed tasks across fog nodes

Enabling device-to-device communications for cooperative training in edge environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical federated foundation models for wireless networks

Integration of edge learning with D2D/P2P fog architectures

Modular structure alignment with M3T foundation models

🔎 Similar Papers

No similar papers found.