Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of label scarcity and cross-domain shift in few-shot image classification, this paper proposes a cache adaptation method based on image patch relational modeling. Unlike existing cache adapters (e.g., Tip-Adapter) that rely solely on CLIP’s global embeddings, our approach introduces a relation-gated graph attention network to explicitly model discriminative interactions among local image patches, coupled with a patch-driven relational refinement mechanism that transfers semantics from global to patch-level representations. During training, structural knowledge is distilled into the cache while freezing the CLIP backbone—introducing zero inference overhead—and residual similarity fusion preserves zero-shot generalization capability. Our method achieves state-of-the-art performance across 11 benchmarks, outperforming all prior cache- and adapter-based approaches. Furthermore, we construct a real-world, battlefield-oriented “injured/non-injured” drone recognition dataset to validate efficacy in time-critical rescue scenarios.

Technology Category

Application Category

📝 Abstract
Few-shot image classification remains difficult under limited supervision and visual domain shift. Recent cache-based adaptation approaches (e.g., Tip-Adapter) address this challenge to some extent by learning lightweight residual adapters over frozen features, yet they still inherit CLIP's tendency to encode global, general-purpose representations that are not optimally discriminative to adapt the generalist to the specialist's domain in low-data regimes. We address this limitation with a novel patch-driven relational refinement that learns cache adapter weights from intra-image patch dependencies rather than treating an image embedding as a monolithic vector. Specifically, we introduce a relational gated graph attention network that constructs a patch graph and performs edge-aware attention to emphasize informative inter-patch interactions, producing context-enriched patch embeddings. A learnable multi-aggregation pooling then composes these into compact, task-discriminative representations that better align cache keys with the target few-shot classes. Crucially, the proposed graph refinement is used only during training to distil relational structure into the cache, incurring no additional inference cost beyond standard cache lookup. Final predictions are obtained by a residual fusion of cache similarity scores with CLIP zero-shot logits. Extensive evaluations on 11 benchmarks show consistent gains over state-of-the-art CLIP adapter and cache-based baselines while preserving zero-shot efficiency. We further validate battlefield relevance by introducing an Injured vs. Uninjured Soldier dataset for casualty recognition. It is motivated by the operational need to support triage decisions within the "platinum minutes" and the broader "golden hour" window in time-critical UAV-driven search-and-rescue and combat casualty care.
Problem

Research questions and friction points this paper is trying to address.

Improves few-shot image classification with limited supervision
Enhances discriminative power via patch-level relational refinement
Maintains zero-shot efficiency while boosting domain adaptation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Patch-driven relational refinement for cache adaptation
Relational gated graph attention network for patch interactions
Multi-aggregation pooling for task-discriminative representations
🔎 Similar Papers
No similar papers found.