MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited adaptability to telecom-specific tasks—particularly network optimization, automated fault diagnosis, intelligent customer service, and compliance auditing—due to insufficient domain grounding and multimodal reasoning capabilities. Method: We introduce TelMM-Bench, the first multimodal benchmark suite tailored to the telecom domain, covering text understanding, image analysis, and cross-modal reasoning. Our approach integrates domain knowledge via supervised fine-tuning of both LLMs and vision-language models (VLMs), enabling joint modeling and retrieval over heterogeneous data. Contribution/Results: Fine-tuned models achieve an average 12.7% accuracy gain across TelMM-Bench subtasks, substantially outperforming general-purpose baselines. Moreover, TelMM-Bench exposes systematic deficiencies in existing models—especially in topology diagram comprehension and log-alert alignment—thereby establishing a reproducible evaluation framework and concrete roadmap for advancing telecom AI.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they hold the potential to transform network optimization, automate troubleshooting, enhance customer support, and ensure regulatory compliance. However, their deployment in telecom is hindered by domain-specific challenges that demand specialized adaptation. To overcome these challenges and to accelerate the adaptation of LLMs for telecom, we propose MM-Telco, a comprehensive suite of multimodal benchmarks and models tailored for the telecom domain. The benchmark introduces various tasks (both text based and image based) that address various practical real-life use cases such as network operations, network management, improving documentation quality, and retrieval of relevant text and images. Further, we perform baseline experiments with various LLMs and VLMs. The models fine-tuned on our dataset exhibit a significant boost in performance. Our experiments also help analyze the weak areas in the working of current state-of-art multimodal LLMs, thus guiding towards further development and research.
Problem

Research questions and friction points this paper is trying to address.

Adapting large language models for telecom-specific challenges and applications
Developing multimodal benchmarks for network operations and management tasks
Addressing performance limitations in current multimodal LLMs for telecom use cases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmarks tailored for telecom domain
Fine-tuned models showing significant performance improvements
Addressing network operations through text and image tasks
🔎 Similar Papers
No similar papers found.
G
Gagan Raj Gupta
IIT Bhilai, India
A
Anshul Kumar
IIT Bhilai, India
M
Manish Rai
IIT Bhilai, India
A
Apu Chakraborty
IIT Bhilai, India
Ashutosh Modi
Ashutosh Modi
Indian Institute of Technology Kanpur
Natural Language ProcessingMachine and Deep LearningArtificial IntelligenceAffective ComputingLegal AI
A
Abdelaali Chaoub
INPT, Morocco
Soumajit Pramanik
Soumajit Pramanik
IIT Bhilai, India
M
Moyank Giri
IIT Bhilai, India
Y
Yashwanth Holla
IIT Kanpur, India
S
Sunny Kumar
IIT Bhilai, India
M
M. V. Kiran Sooraj
IIT Bhilai, India