🤖 AI Summary
To address domain-induced self-attention feature shifts during online test-time adaptation (TTA) of Transformer models, this paper proposes a layer-wise progressive conditional scale-shift recalibration mechanism. The method models domain shift as a layer-wise progressive separation process and introduces two lightweight, differentiable networks—domain separation network and factor generation network—that operate online to dynamically predict layer-specific conditional scale and shift parameters for each self-attention module. These parameters are then applied via efficient local linear transformations to recalibrate features. Crucially, the approach requires no access to source-domain data or labels and operates entirely online. Evaluated on benchmarks including ImageNet-C, it achieves up to a 3.9% improvement in classification accuracy, significantly outperforming existing online TTA methods. Key contributions include: (i) the first formulation of domain shift as a progressive, layer-wise separation process; (ii) a fully online, parameter-efficient recalibration framework; and (iii) state-of-the-art performance without source data dependency.
📝 Abstract
Online test-time adaptation aims to dynamically adjust a network model in real-time based on sequential input samples during the inference stage. In this work, we find that, when applying a transformer network model to a new target domain, the Query, Key, and Value features of its self-attention module often change significantly from those in the source domain, leading to substantial performance degradation of the transformer model. To address this important issue, we propose to develop a new approach to progressively recalibrate the self-attention at each layer using a local linear transform parameterized by conditioned scale and shift factors. We consider the online model adaptation from the source domain to the target domain as a progressive domain shift separation process. At each transformer network layer, we learn a Domain Separation Network to extract the domain shift feature, which is used to predict the scale and shift parameters for self-attention recalibration using a Factor Generator Network. These two lightweight networks are adapted online during inference. Experimental results on benchmark datasets demonstrate that the proposed progressive conditioned scale-shift recalibration (PCSR) method is able to significantly improve the online test-time domain adaptation performance by a large margin of up to 3.9% in classification accuracy on the ImageNet-C dataset.