Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses core challenges in scalability, privacy preservation, and edge-resource coordination for distributed large language models (LLMs) and multimodal LLMs (MLLMs). It systematically surveys decentralized techniques across training, inference, fine-tuning, and deployment. Methodologically, it introduces the first comprehensive taxonomy for distributed MLLMs—spanning six dimensions: data/model/pipeline parallelism, federated learning, edge-coordinated inference, multimodal alignment compression, and privacy-enhancing computation (e.g., differential privacy, secure aggregation). The analysis identifies 12 critical bottlenecks, including weak robustness, insufficient privacy guarantees, and lack of cross-modal edge coordination. Key contributions include: (1) the first holistic architectural map of distributed MLLM technologies; (2) seven actionable research directions; and (3) a novel paradigm enabling scalable, secure, and cross-modal-fused distributed AI—providing both theoretical foundations and practical guidelines for industrial-grade deployment.

Technology Category

Application Category

📝 Abstract
Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text. LMs have a wide range of applications in natural language processing (NLP) tasks, including autocomplete and machine translation. Although larger datasets typically enhance LM performance, scalability remains a challenge due to constraints in computational power and resources. Distributed computing strategies offer essential solutions for improving scalability and managing the growing computational demand. Further, the use of sensitive datasets in training and deployment raises significant privacy concerns. Recent research has focused on developing decentralized techniques to enable distributed training and inference while utilizing diverse computational resources and enabling edge AI. This paper presents a survey on distributed solutions for various LMs, including large language models (LLMs), vision language models (VLMs), multimodal LLMs (MLLMs), and small language models (SLMs). While LLMs focus on processing and generating text, MLLMs are designed to handle multiple modalities of data (e.g., text, images, and audio) and to integrate them for broader applications. To this end, this paper reviews key advancements across the MLLM pipeline, including distributed training, inference, fine-tuning, and deployment, while also identifying the contributions, limitations, and future areas of improvement. Further, it categorizes the literature based on six primary focus areas of decentralization. Our analysis describes gaps in current methodologies for enabling distributed solutions for LMs and outline future research directions, emphasizing the need for novel solutions to enhance the robustness and applicability of distributed LMs.
Problem

Research questions and friction points this paper is trying to address.

Address scalability challenges in distributed large language models
Explore privacy concerns with sensitive data in LM training
Survey decentralized techniques for multimodal LM training and inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed computing for scalable language models
Decentralized techniques for privacy and edge AI
Multimodal data integration in distributed LLMs
🔎 Similar Papers
No similar papers found.
Hadi Amini
Hadi Amini
Knight Foundation School of Computing and Information Sciences, Florida International University; Security, Optimization, and Learning for InterDependent Networks Laboratory (solid lab), FIU
Md Jueal Mia
Md Jueal Mia
Graduate Research Assistant, knight foundation school of computing and information sciences, FIU
Privacy and SecurityFederated LearningMachine LearningLarge Language Model
Y
Yasaman Saadati
Knight Foundation School of Computing and Information Sciences, Florida International University; Security, Optimization, and Learning for InterDependent Networks Laboratory (solid lab), FIU
Ahmed Imteaj
Ahmed Imteaj
Assistant Professor, Florida Atlantic University
Robust and Secure AIMultimodal LLMsFederated LearningCybersecurity
S
Seyedsina Nabavirazavi
Knight Foundation School of Computing and Information Sciences, Florida International University
U
Urmish Thakker
Deep Learning Research, SambaNova Systems
Md Zarif Hossain
Md Zarif Hossain
PhD student, Florida Atlantic University, Boca Raton
Vision Language ModelSecure AIFederated LearningComputer Vision
A
Awal Ahmed Fime
School of Computing, Southern Illinois University
S
S. S. Iyengar
Knight Foundation School of Computing and Information Sciences, Florida International University