CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective

📅 2025-12-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing few-shot fine-grained visual classification (FS-FGVC) methods overlook distribution shifts and spurious correlations induced by the support set as a confounder, leading to biased discriminative feature learning. This paper introduces structural causal modeling (SCM) into FS-FGVC for the first time, establishing a causal framework to model the intrinsic relationship between images and subcategories. We propose two complementary intervention mechanisms: (i) an Interventional Multi-Scale Encoder (IMSE) operating at the sample level to disentangle confounding factors, and (ii) Interventional Masked Feature Reconstruction (IMFR) to identify genuine causal pathways. Extensive experiments on CUB-200-2011, Stanford Dogs, and Stanford Cars demonstrate new state-of-the-art performance, with significant improvements in discriminative robustness under few-shot settings and cross-task generalization capability.

Technology Category

Application Category

📝 Abstract
Few-shot fine-grained visual categorization (FS-FGVC) focuses on identifying various subcategories within a common superclass given just one or few support examples. Most existing methods aim to boost classification accuracy by enriching the extracted features with discriminative part-level details. However, they often overlook the fact that the set of support samples acts as a confounding variable, which hampers the FS-FGVC performance by introducing biased data distribution and misguiding the extraction of discriminative features. To address this issue, we propose a new causal FS-FGVC (CausalFSFG) approach inspired by causal inference for addressing biased data distributions through causal intervention. Specifically, based on the structural causal model (SCM), we argue that FS-FGVC infers the subcategories (i.e., effect) from the inputs (i.e., cause), whereas both the few-shot condition disturbance and the inherent fine-grained nature (i.e., large intra-class variance and small inter-class variance) lead to unobservable variables that bring spurious correlations, compromising the final classification performance. To further eliminate the spurious correlations, our CausalFSFG approach incorporates two key components: (1) Interventional multi-scale encoder (IMSE) conducts sample-level interventions, (2) Interventional masked feature reconstruction (IMFR) conducts feature-level interventions, which together reveal real causalities from inputs to subcategories. Extensive experiments and thorough analyses on the widely-used public datasets, including CUB-200-2011, Stanford Dogs, and Stanford Cars, demonstrate that our CausalFSFG achieves new state-of-the-art performance. The code is available at https://github.com/PKU-ICST-MIPL/CausalFSFG_TMM.
Problem

Research questions and friction points this paper is trying to address.

Addresses biased data distribution in few-shot fine-grained visual categorization
Eliminates spurious correlations caused by few-shot condition disturbance
Improves classification by revealing real causalities from inputs to subcategories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal intervention addresses biased data distributions
Interventional multi-scale encoder conducts sample-level interventions
Interventional masked feature reconstruction conducts feature-level interventions
🔎 Similar Papers
No similar papers found.
Zhiwen Yang
Zhiwen Yang
Beihang University
Low-level VisionAIGCMedical Image Analysis
J
Jinglin Xu
School of Intelligence Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
Y
Yuxin Peng
Wangxuan Institute of Computer Technology, Peking University, Beijing, 100871, China