FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

154K/year
🤖 AI Summary
This work addresses the vulnerability of open-vocabulary object detection to spurious correlations between non-causal visual attributes—such as brightness and texture—and object categories under distribution shifts. To mitigate this issue, the paper introduces the first training-free test-time adaptation framework for this task, incorporating explicit counterfactual reasoning. Specifically, it generates counterfactual views of test images by perturbing non-causal attributes and compares region-level predictions between original and counterfactual views to quantify attribute sensitivity. Based on this sensitivity, the method selectively suppresses unreliable predictions without requiring online optimization, enabling attribute-specific correction. Experiments demonstrate that the proposed approach significantly outperforms existing test-time adaptation methods on PASCAL-C, COCO-C, and FoggyCityscapes, substantially improving model robustness under distribution shifts.
📝 Abstract
Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on PASCAL-C, COCO-C, and FoggyCityscapes show that FACTOR consistently outperforms prior TTA methods, demonstrating that explicit counterfactual reasoning effectively improves robustness under distribution shifts.
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary object detection
distribution shift
spurious correlations
non-causal attributes
test-time adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual reasoning
test-time adaptation
open-vocabulary object detection
distribution shift
training-free
K
Kaixiang Zhao
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
M
Mao Ye
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Lihua Zhou
Lihua Zhou
Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science and Innovation, CAS
Machine LearningTransfer Learning
Hu Wang
Hu Wang
Research Scientist, MBZUAI
Medical AICVMLReinforcement Learning
L
Luping Ji
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
S
Song Tang
Institute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, China
Xiatian Zhu
Xiatian Zhu
University of Surrey
Machine LearningComputer Vision