CTA: Cross-Task Alignment for Better Test Time Training

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Deep learning models often suffer from performance degradation under distribution shifts during testing. To address this, we propose a test-time training method that aligns cross-task representations without fine-tuning the backbone network. Our approach introduces multimodal contrastive learning principles to jointly supervise and self-supervise dual encoders, enabling semantic alignment of their latent spaces during inference—thereby avoiding gradient interference and enhancing semantic consistency in parameter updates. Crucially, the method imposes no architectural constraints on the backbone; instead, it employs only a lightweight alignment module for online adaptation. Evaluated on standard distribution shift benchmarks—including CIFAR-10-C and ImageNet-C—our method consistently outperforms existing test-time adaptation approaches, achieving superior robustness and generalization across diverse corruption types and severity levels.

Technology Category

Application Category

📝 Abstract

Deep learning models have demonstrated exceptional performance across a wide range of computer vision tasks. However, their performance often degrades significantly when faced with distribution shifts, such as domain or dataset changes. Test-Time Training (TTT) has emerged as an effective method to enhance model robustness by incorporating an auxiliary unsupervised task during training and leveraging it for model updates at test time. In this work, we introduce CTA (Cross-Task Alignment), a novel approach for improving TTT. Unlike existing TTT methods, CTA does not require a specialized model architecture and instead takes inspiration from the success of multi-modal contrastive learning to align a supervised encoder with a self-supervised one. This process enforces alignment between the learned representations of both models, thereby mitigating the risk of gradient interference, preserving the intrinsic robustness of self-supervised learning and enabling more semantically meaningful updates at test-time. Experimental results demonstrate substantial improvements in robustness and generalization over the state-of-the-art on several benchmark datasets.

Problem

Research questions and friction points this paper is trying to address.

Enhance model robustness against distribution shifts

Align supervised and self-supervised learning representations

Improve test-time training without specialized architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns supervised and self-supervised encoders contrastively

Avoids specialized architecture via cross-task alignment

Preserves robustness through gradient interference mitigation

🔎 Similar Papers

Cross-Modal Safety Alignment: Is textual unlearning all you need?