Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of multimodal large language models in in-context learning, which are constrained by finite context window lengths and the high computational cost of long-sequence key-value (KV) caching. Existing compression methods often introduce bias, disrupt semantic structure, and lack dynamic adaptability to new queries. To overcome these challenges, the authors propose TASM, a training-free framework for efficient multimodal context compression and retrieval. TASM leverages task vectors to guide compression instead of relying on sample-dependent signals, employs semantic-aware bipartite matching for lossless token merging, and introduces a hierarchical dynamic memory architecture composed of a core memory and a latent bank. Experiments demonstrate that TASM maintains strong performance under aggressive compression ratios, effectively balancing computational efficiency with task adaptability.

📝 Abstract

Multi-modal large language models (MLLMs) depend on in-context learning (ICL) for rapid task adaptation, but their scalability is severely limited by finite context windows and the growing cost of key-value (KV) caches in long multi-modal sequences. Existing memory compression approaches typically rely on rigid token removal or sample-dependent importance estimation, which introduces bias, disrupts semantic structure, particularly for visual representations, and yields static memories that cannot adapt to new queries. We introduce TASM (Task-Aware Structured Memory), a training-free framework that addresses these limitations through task-aware, structure-preserving, and dynamically accessible memory construction. TASM employs task-vector guided compression to replace sample-specific signals with a task-level direction that captures shared relevance across demonstrations. To preserve the underlying manifold, it applies semantics-aware token merging via bipartite graph matching, aggregating tokens without destructive pruning. Finally, TASM structures memory into a hierarchy comprising a compact Core Memory and a Latent Bank, facilitating query-adaptive dynamic retrieval. Evaluations confirm TASM maintains high performance under heavy compression, effectively balancing efficiency with adaptability.

Problem

Research questions and friction points this paper is trying to address.

in-context learning

multi-modal large language models

memory compression

KV cache

semantic structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-Aware Compression

Structured Memory

Dynamic Retrieval