Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research lacks a systematic taxonomy and analytical framework for multimodal federated learning (MFL) aligned with federated learning (FL) paradigms. This work addresses unique challenges—stemming from modality heterogeneity, privacy heterogeneity, and communication inefficiency—in horizontal, vertical, and hybrid FL settings. We propose the first FL-aligned unified taxonomy for MFL, explicitly characterizing core problems and algorithmic evolution across paradigms. Methodologically, we integrate FL theory, multimodal representation learning, and distributed optimization to tackle key technical challenges, including cross-modal alignment and privacy-utility trade-offs. Our contribution fills a critical gap in the MFL literature by delivering the first comprehensive, paradigm-grounded survey. It establishes a theoretical foundation and research roadmap for designing robust, efficient, and privacy-preserving multimodal federated systems. (132 words)

Technology Category

Application Category

📝 Abstract
Multimodal Federated Learning (MFL) lies at the intersection of two pivotal research areas: leveraging complementary information from multiple modalities to improve downstream inference performance and enabling distributed training to enhance efficiency and preserve privacy. Despite the growing interest in MFL, there is currently no comprehensive taxonomy that organizes MFL through the lens of different Federated Learning (FL) paradigms. This perspective is important because multimodal data introduces distinct challenges across various FL settings. These challenges, including modality heterogeneity, privacy heterogeneity, and communication inefficiency, are fundamentally different from those encountered in traditional unimodal or non-FL scenarios. In this paper, we systematically examine MFL within the context of three major FL paradigms: horizontal FL (HFL), vertical FL (VFL), and hybrid FL. For each paradigm, we present the problem formulation, review representative training algorithms, and highlight the most prominent challenge introduced by multimodal data in distributed settings. We also discuss open challenges and provide insights for future research. By establishing this taxonomy, we aim to uncover the novel challenges posed by multimodal data from the perspective of different FL paradigms and to offer a new lens through which to understand and advance the development of MFL.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive taxonomy for Multimodal Federated Learning (MFL) paradigms
Challenges like modality and privacy heterogeneity in distributed multimodal settings
Systematic examination of MFL across horizontal, vertical, and hybrid FL paradigms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging multimodal data in federated learning
Addressing modality and privacy heterogeneity challenges
Systematic taxonomy across FL paradigms
🔎 Similar Papers
No similar papers found.