Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Scientific domains often suffer from scarce labeled data, and conventional supervised fine-tuning risks compromising the generalizability and robustness of vision foundation models. To address this challenge, this work proposes FINO, a novel self-supervised domain adaptation method that operates without labels by leveraging generic metadata—encompassing both discrete and continuous types—as self-supervisory signals to guide the model in preserving task-relevant information while suppressing irrelevant variations. FINO integrates standard self-supervised objectives with a lightweight metadata-guided mechanism, enabling effective downstream deployment with only a simple probe head. Extensive experiments across diverse scientific domains—including subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging—demonstrate that FINO consistently outperforms existing unsupervised domain adaptation approaches, fully supervised fine-tuning baselines, and domain-specific state-of-the-art methods.

📝 Abstract

We propose a label-free approach to adapt powerful but generic vision foundation models to specialized scientific domains. Standard supervised fine-tuning is often ill-suited to these settings: labels are scarce, and task-specific training can collapse the model's generality and hurt robustness. We instead leverage metadata to adapt representations to new domains in a self-supervised manner. Our method, FINO, combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.

Problem

Research questions and friction points this paper is trying to address.

vision foundation models

label-free adaptation

scientific domains

metadata

self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

label-free adaptation

vision foundation models

metadata-guided learning