🤖 AI Summary
Pure entropy minimization at test time often induces model collapse—manifesting as degenerate outputs (e.g., constant predictions), exploding logit norms, or class shift—thereby impairing generalization. To address this, we propose ZeroSiam: an adaptive test-time inference framework built upon a lightweight asymmetric Siamese architecture. Its core innovations include a divergence alignment mechanism, classifier-preceding gradient stopping, and a learnable dynamic predictor—jointly suppressing output collapse and regularizing bias signals. Crucially, ZeroSiam stabilizes entropy optimization during inference without requiring any training data. Extensive experiments demonstrate that ZeroSiam significantly outperforms existing test-time adaptation methods across vision domain adaptation and large language model inference tasks. Notably, it effectively mitigates collapse in smaller models, enhancing cross-environment generalization and robust reasoning capabilities.
📝 Abstract
Test-time entropy minimization helps adapt a model to novel environments and incentivize its reasoning capability, unleashing the model's potential during inference by allowing it to evolve and improve in real-time using its own predictions, achieving promising performance. However, pure entropy minimization can favor non-generalizable shortcuts, such as inflating the logit norm and driving all predictions to a dominant class to reduce entropy, risking collapsed solutions (e.g., constant one-hot outputs) that trivially minimize the objective without meaningful learning. In this paper, we introduce ZeroSiam, an efficient asymmetric Siamese architecture tailored for test-time entropy minimization. ZeroSiam prevents collapse through asymmetric divergence alignment, which is efficiently achieved by a learnable predictor and a stop-gradient operator before the classifier. We provide empirical and theoretical evidence that ZeroSiam not only prevents collapse solutions, but also absorbs and regularizes biased learning signals, enhancing performance even when no collapse occurs. Despite its simplicity, extensive results show that ZeroSiam performs more stably over prior methods using negligible overhead, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including tiny models that are particularly collapse-prone.