FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the limitations of existing test-time adaptation methods, which either rely on backpropagation—hindering deployment on resource-constrained devices—or adopt gradient-free strategies with limited adaptability. The authors propose a novel zeroth-order prompt optimization framework that eliminates the need for backpropagation by dynamically optimizing intermediate feature statistics and prediction entropy through forward-only computation. Key innovations include the first zeroth-order prompt tuning paradigm, a dynamic decay mechanism for perturbation scale, and a theoretical convergence guarantee under streaming distribution shifts. Experiments demonstrate that the method achieves a Top-1 accuracy of 59.52% on ImageNet-C (5K, severity level 5), outperforming both mainstream gradient-based approaches and the state-of-the-art gradient-free method FOA (58.13%), while maintaining strong generalization even on INT8 quantized models.

Technology Category

Application Category

📝 Abstract

Test-Time Adaptation (TTA) is essential for enabling deep learning models to handle real-world data distribution shifts. However, current approaches face significant limitations: backpropagation-based methods are not suitable for low-end deployment devices, due to their high computation and memory requirements, as well as their tendency to modify model weights during adaptation; while traditional backpropagation-free techniques exhibit constrained adaptation capabilities. In this work, we propose Forward-Only Zeroth-Order Optimization (FOZO), a novel and practical backpropagation-free paradigm for TTA. FOZO leverages a memory-efficient zeroth-order prompt optimization, which is led by objectives optimizing both intermediate feature statistics and prediction entropy. To ensure efficient and stable adaptation over the out-of-distribution data stream, we introduce a dynamically decaying perturbation scale during zeroth-order gradient estimation and theoretically prove its convergence under the TTA data stream assumption. Extensive continual adaptation experiments on ImageNet-C, ImageNet-R, and ImageNet-Sketch demonstrate FOZO's superior performance, achieving 59.52% Top-1 accuracy on ImageNet-C (5K, level 5) and outperforming main gradient-based methods and SOTA forward-only FOA (58.13%). Furthermore, FOZO exhibits strong generalization on quantized (INT8) models. These findings demonstrate that FOZO is a highly competitive solution for TTA deployment in resource-limited scenarios.

Problem

Research questions and friction points this paper is trying to address.

Test-Time Adaptation

Backpropagation-free

Resource-limited Deployment

Data Distribution Shift

Model Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zeroth-Order Optimization

Test-Time Adaptation

Backpropagation-Free