LAB-Det: Language as a Domain-Invariant Bridge for Training-Free One-Shot Domain Generalization in Object Detection

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of generic object detectors in data-scarce, domain-specific scenarios such as underwater or industrial defect detection, where existing few-shot cross-domain methods often rely on fine-tuning, leading to overfitting and high computational costs. To overcome these limitations, we propose the first training-free, one-shot domain generalization approach for object detection. Our method converts the single available example per class into descriptive text, leveraging language as a domain-invariant bridge to guide frozen foundation detectors—such as GLIP or Grounding DINO—toward effective cross-domain adaptation without any parameter updates. Experiments on UODD and NEU-DET benchmarks demonstrate performance gains of up to 5.4 mAP, surpassing state-of-the-art fine-tuning-based methods and highlighting the efficiency, robustness, and interpretability of language-driven adaptation.

Technology Category

Application Category

📝 Abstract
Foundation object detectors such as GLIP and Grounding DINO excel on general-domain data but often degrade in specialized and data-scarce settings like underwater imagery or industrial defects. Typical cross-domain few-shot approaches rely on fine-tuning scarce target data, incurring cost and overfitting risks. We instead ask: Can a frozen detector adapt with only one exemplar per class without training? To answer this, we introduce training-free one-shot domain generalization for object detection, where detectors must adapt to specialized domains with only one annotated exemplar per class and no weight updates. To tackle this task, we propose LAB-Det, which exploits Language As a domain-invariant Bridge. Instead of adapting visual features, we project each exemplar into a descriptive text that conditions and guides a frozen detector. This linguistic conditioning replaces gradient-based adaptation, enabling robust generalization in data-scarce domains. We evaluate on UODD (underwater) and NEU-DET (industrial defects), two widely adopted benchmarks for data-scarce detection, where object boundaries are often ambiguous, and LAB-Det achieves up to 5.4 mAP improvement over state-of-the-art fine-tuned baselines without updating a single parameter. These results establish linguistic adaptation as an efficient and interpretable alternative to fine-tuning in specialized detection settings.
Problem

Research questions and friction points this paper is trying to address.

domain generalization
one-shot learning
object detection
training-free adaptation
data-scarce domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free adaptation
one-shot domain generalization
language-conditioned detection
domain-invariant representation
frozen detector