Multi-Agent Cooperative Learning for Robust Vision-Language Alignment under OOD Concepts

📅 2026-01-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the collapse of cross-modal alignment in vision-language models under out-of-distribution (OOD) scenarios by proposing a multi-agent collaborative learning framework. The framework incorporates four specialized agents—image, text, naming, and coordination—that interact through structured message passing, multi-agent feature-space naming learning, context-exchange-enhanced few-shot learning, and an adaptive dynamic balancing mechanism. These components collectively mitigate modality imbalance and enhance alignment robustness. Experimental results on the VISTA-Beyond dataset demonstrate that the proposed method significantly outperforms baseline approaches in both few-shot and zero-shot settings, achieving consistent accuracy improvements of 1–5% across diverse visual tasks.

Technology Category

Application Category

📝 Abstract

This paper introduces a novel Multi-Agent Cooperative Learning (MACL) framework to address cross-modal alignment collapse in vision-language models when handling out-of-distribution (OOD) concepts. Four core agents, including image, text, name, and coordination agents, collaboratively mitigate modality imbalance through structured message passing. The proposed framework enables multi-agent feature space name learning, incorporates a context exchange enhanced few-shot learning algorithm, and adopts an adaptive dynamic balancing mechanism to regulate inter-agent contributions. Experiments on the VISTA-Beyond dataset demonstrate that MACL significantly improves performance in both few-shot and zero-shot settings, achieving 1-5% precision gains across diverse visual domains.

Problem

Research questions and friction points this paper is trying to address.

vision-language alignment

out-of-distribution

multi-agent learning

cross-modal alignment

modality imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Cooperative Learning

Vision-Language Alignment

Out-of-Distribution Generalization

Few-Shot Learning

Modality Imbalance

🔎 Similar Papers

No similar papers found.

Authors to Follow