🤖 AI Summary
The design of connectors in multimodal large language models (MLLMs) lacks systematic analysis, hindering both performance improvement and interpretability. To address this, we propose the first two-dimensional taxonomy for MLLM connectors: (i) at the atomic operation level, we introduce novel operations—including cross-modal mapping, dynamic compression, and mixture-of-experts gating; and (ii) at the architectural level, we categorize paradigms such as hierarchical, multi-encoder, and multi-scenario designs, while identifying emerging directions like guided information selection and adaptive compression. Our methodology integrates comprehensive literature review, information-theoretic modeling, and empirical benchmarking across diverse connector variants. This yields the first holistic survey framework that clarifies cross-modal alignment strategies, establishes theoretical foundations for connector design, and provides actionable guidelines toward next-generation connectors that are efficient, adaptive, and interpretable.
📝 Abstract
With the rapid advancements in multi-modal large language models (MLLMs), connectors play a pivotal role in bridging diverse modalities and enhancing model performance. However, the design and evolution of connectors have not been comprehensively analyzed, leaving gaps in understanding how these components function and hindering the development of more powerful connectors. In this survey, we systematically review the current progress of connectors in MLLMs and present a structured taxonomy that categorizes connectors into atomic operations (mapping, compression, mixture of experts) and holistic designs (multi-layer, multi-encoder, multi-modal scenarios), highlighting their technical contributions and advancements. Furthermore, we discuss several promising research frontiers and challenges, including high-resolution input, dynamic compression, guide information selection, combination strategy, and interpretability. This survey is intended to serve as a foundational reference and a clear roadmap for researchers, providing valuable insights into the design and optimization of next-generation connectors to enhance the performance and adaptability of MLLMs.