🤖 AI Summary
To address the sensitivity to hyperparameters, reliance on human annotations, and insufficient robustness in test-time adaptation (TTA), this paper proposes Human-in-the-Loop TTA (HILTTA), a novel human–machine collaborative paradigm. Our method jointly optimizes active learning and hyperparameter-aware model selection: first, it selects high-value samples—based on predictive uncertainty and distributional representativeness—for human annotation; then, it leverages these few annotated samples to guide robust model selection, incorporating multiple regularizers to mitigate validation set distribution shift. As the first work to co-optimize sample selection and model selection within HILTTA, our approach features a plug-and-play design compatible with diverse offline TTA methods. Evaluated on five TTA benchmarks, HILTTA significantly outperforms existing HILTTA approaches and consistently avoids worst-case hyperparameter configurations across all off-the-shelf TTA methods, markedly improving both stability and generalization.
📝 Abstract
Existing test-time adaptation (TTA) approaches often adapt models with the unlabeled testing data stream. A recent attempt relaxed the assumption by introducing limited human annotation, referred to as Human-In-the-Loop Test-Time Adaptation (HILTTA) in this study. The focus of existing HILTTA studies lies in selecting the most informative samples to label, a.k.a. active learning. In this work, we are motivated by a pitfall of TTA, i.e. sensitivity to hyper-parameters, and propose to approach HILTTA by synergizing active learning and model selection. Specifically, we first select samples for human annotation (active learning) and then use the labeled data to select optimal hyper-parameters (model selection). To prevent the model selection process from overfitting to local distributions, multiple regularization techniques are employed to complement the validation objective. A sample selection strategy is further tailored by considering the balance between active learning and model selection purposes. We demonstrate on 5 TTA datasets that the proposed HILTTA approach is compatible with off-the-shelf TTA methods and such combinations substantially outperform the state-of-the-art HILTTA methods. Importantly, our proposed method can always prevent choosing the worst hyper-parameters on all off-the-shelf TTA methods. The source code is available at https://github.com/Yushu-Li/HILTTA.