Implicit Modeling for Transferability Estimation of Vision Foundation Models

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to accurately assess the downstream transferability of vision foundation models—across heterogeneous architectures, diverse training strategies, and task-aligned objectives—without fine-tuning. To address this, we propose Implicit Transferability Modeling (ITM), a framework that employs Divide-and-Variational Approximation (DVA) to implicitly characterize the evolution of embedding spaces, enabling efficient and stable cross-architecture and cross-task transferability estimation. ITM requires no fine-tuning, imposes no assumptions on model architecture or task-head design, and thus achieves superior efficiency and generalizability. Extensive experiments on large-scale benchmarks demonstrate that ITM significantly outperforms state-of-the-art methods in estimation stability, predictive accuracy, and computational efficiency. By providing a reliable, architecture- and task-agnostic evaluation paradigm, ITM facilitates rapid screening and deployment of vision foundation models.

Technology Category

Application Category

📝 Abstract
Transferability estimation identifies the best pre-trained models for downstream tasks without incurring the high computational cost of full fine-tuning. This capability facilitates deployment and advances the pre-training and fine-tuning paradigm. However, existing methods often struggle to accurately assess transferability for emerging pre-trained models with diverse architectures, training strategies, and task alignments. In this work, we propose Implicit Transferability Modeling (ITM), a novel framework that implicitly models each model's intrinsic transferability, coupled with a Divide-and-Conquer Variational Approximation (DVA) strategy to efficiently approximate embedding space evolution. This design enables generalization across a broader range of models and downstream tasks. Extensive experiments on a comprehensive benchmark--spanning extensive training regimes and a wider variety of model types--demonstrate that ITM consistently outperforms existing methods in terms of stability, effectiveness, and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Estimating transferability of vision foundation models
Overcoming limitations of existing assessment methods
Generalizing across diverse architectures and tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicitly models intrinsic transferability of foundation models
Uses variational approximation for embedding space evolution
Generalizes across diverse model architectures and tasks
🔎 Similar Papers
No similar papers found.
Yaoyan Zheng
Yaoyan Zheng
BUAA
Computer VisionDeep LearningDigital Image Processing
H
Huiqun Wang
School of Computer Science and Engineering, Beihang University, Beijing, China
N
Nan Zhou
School of Computer Science and Engineering, Beihang University, Beijing, China
D
Di Huang
School of Computer Science and Engineering, Beihang University, Beijing, China