Connector-S: A Survey of Connectors in Multi-modal Large Language Models

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The design of connectors in multimodal large language models (MLLMs) lacks systematic analysis, hindering both performance improvement and interpretability. To address this, we propose the first two-dimensional taxonomy for MLLM connectors: (i) at the atomic operation level, we introduce novel operations—including cross-modal mapping, dynamic compression, and mixture-of-experts gating; and (ii) at the architectural level, we categorize paradigms such as hierarchical, multi-encoder, and multi-scenario designs, while identifying emerging directions like guided information selection and adaptive compression. Our methodology integrates comprehensive literature review, information-theoretic modeling, and empirical benchmarking across diverse connector variants. This yields the first holistic survey framework that clarifies cross-modal alignment strategies, establishes theoretical foundations for connector design, and provides actionable guidelines toward next-generation connectors that are efficient, adaptive, and interpretable.

Technology Category

Application Category

📝 Abstract
With the rapid advancements in multi-modal large language models (MLLMs), connectors play a pivotal role in bridging diverse modalities and enhancing model performance. However, the design and evolution of connectors have not been comprehensively analyzed, leaving gaps in understanding how these components function and hindering the development of more powerful connectors. In this survey, we systematically review the current progress of connectors in MLLMs and present a structured taxonomy that categorizes connectors into atomic operations (mapping, compression, mixture of experts) and holistic designs (multi-layer, multi-encoder, multi-modal scenarios), highlighting their technical contributions and advancements. Furthermore, we discuss several promising research frontiers and challenges, including high-resolution input, dynamic compression, guide information selection, combination strategy, and interpretability. This survey is intended to serve as a foundational reference and a clear roadmap for researchers, providing valuable insights into the design and optimization of next-generation connectors to enhance the performance and adaptability of MLLMs.
Problem

Research questions and friction points this paper is trying to address.

Analyze connector design in MLLMs
Categorize connectors into atomic and holistic
Explore research frontiers for connector optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic review of MLLM connectors
Taxonomy categorizes connectors into operations
Discusses high-resolution, dynamic compression challenges
🔎 Similar Papers
No similar papers found.
X
Xun Zhu
Department of Electronic Engineering, Tsinghua University
Z
Zheng Zhang
Department of Electronic Engineering, Tsinghua University
X
Xi Chen
Department of Electronic Engineering, Tsinghua University
Yiming Shi
Yiming Shi
University of Electronic Science and Technology of China
Efficient AIParameter Efficient Fine TuningDiffusionMultimodal
M
Miao Li
Department of Electronic Engineering, Tsinghua University
Ji Wu
Ji Wu
Tsinghua University
Artificial Intelligence,smart healthcaremachine learningpattern recognitionspeech recognition