Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning

📅 2025-01-02

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing multimodal pretrained Transformers (MMTs) suffer from poor robustness under modality missing, rigid prompting, and distorted modality completion. To address these issues, this paper proposes RAGPT, a retrieval-augmented dynamic prompt tuning framework. RAGPT employs multi-channel cross-sample retrieval to acquire relevant contextual knowledge, enabling context-aware generation of missing modalities; it further constructs an instance-adaptive and knowledge-transferable dynamic prompter, overcoming the limitations of static prompts and dummy token filling. Evaluated on three real-world incomplete multimodal datasets, RAGPT consistently outperforms state-of-the-art prompt-learning and modality-imputation methods in both accuracy and robustness. It is the first framework to unify retrieval guidance, generative collaboration, and dynamic prompting within a single coherent modeling paradigm.

Technology Category

Application Category

📝 Abstract

Multimodal learning with incomplete modality is practical and challenging. Recently, researchers have focused on enhancing the robustness of pre-trained MultiModal Transformers (MMTs) under missing modality conditions by applying learnable prompts. However, these prompt-based methods face several limitations: (1) incomplete modalities provide restricted modal cues for task-specific inference, (2) dummy imputation for missing content causes information loss and introduces noise, and (3) static prompts are instance-agnostic, offering limited knowledge for instances with various missing conditions. To address these issues, we propose RAGPT, a novel Retrieval-AuGmented dynamic Prompt Tuning framework. RAGPT comprises three modules: (I) the multi-channel retriever, which identifies similar instances through a within-modality retrieval strategy, (II) the missing modality generator, which recovers missing information using retrieved contexts, and (III) the context-aware prompter, which captures contextual knowledge from relevant instances and generates dynamic prompts to largely enhance the MMT's robustness. Extensive experiments conducted on three real-world datasets show that RAGPT consistently outperforms all competitive baselines in handling incomplete modality problems. The code of our work and prompt-based baselines is available at https://github.com/Jian-Lang/RAGPT.

Problem

Research questions and friction points this paper is trying to address.

Multi-modal Learning

Information Incompleteness

Transformer Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAGPT

Multimodal Transformers

Dynamic Prompting

🔎 Similar Papers

ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models

2024-10-08arXiv.orgCitations: 15