Noise Matters: Optimizing Matching Noise for Diffusion Classifiers

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Diffusion classifiers (DCs) suffer from noise instability: performance varies drastically across different sampling noise realizations, necessitating ~100-sample ensembling for robustness—severely hindering inference efficiency. To address this, we introduce the novel concept of “good noise,” formalized by two principles—frequency matching and spatial matching—and propose a learnable, image-conditioned meta-network that generates parameterized noise. Integrated within a joint framework of pretrained diffusion models and vision-language models, our method enables end-to-end training via gradient-based optimization of noise parameters. Experiments demonstrate that only 5–10 noise samples suffice to outperform conventional 100-sample ensembling, achieving substantial reductions in noise-induced variance, consistent improvements in classification accuracy across multiple benchmarks, and over 10× speedup in inference time—without architectural modifications or additional inference-time computation.

Technology Category

Application Category

📝 Abstract

Although today's pretrained discriminative vision-language models (e.g., CLIP) have demonstrated strong perception abilities, such as zero-shot image classification, they also suffer from the bag-of-words problem and spurious bias. To mitigate these problems, some pioneering studies leverage powerful generative models (e.g., pretrained diffusion models) to realize generalizable image classification, dubbed Diffusion Classifier (DC). Specifically, by randomly sampling a Gaussian noise, DC utilizes the differences of denoising effects with different category conditions to classify categories. Unfortunately, an inherent and notorious weakness of existing DCs is noise instability: different random sampled noises lead to significant performance changes. To achieve stable classification performance, existing DCs always ensemble the results of hundreds of sampled noises, which significantly reduces the classification speed. To this end, we firstly explore the role of noise in DC, and conclude that: there are some ``good noises'' that can relieve the instability. Meanwhile, we argue that these good noises should meet two principles: Frequency Matching and Spatial Matching. Regarding both principles, we propose a novel Noise Optimization method to learn matching (i.e., good) noise for DCs: NoOp. For frequency matching, NoOp first optimizes a dataset-specific noise: Given a dataset and a timestep t, optimize one randomly initialized parameterized noise. For Spatial Matching, NoOp trains a Meta-Network that adopts an image as input and outputs image-specific noise offset. The sum of optimized noise and noise offset will be used in DC to replace random noise. Extensive ablations on various datasets demonstrated the effectiveness of NoOp.

Problem

Research questions and friction points this paper is trying to address.

Addresses noise instability in Diffusion Classifiers (DC).

Optimizes noise to improve classification stability and speed.

Proposes Frequency and Spatial Matching principles for noise selection.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimize dataset-specific noise for frequency matching

Train Meta-Network for image-specific noise offset

Replace random noise with optimized matching noise

🔎 Similar Papers

Diffusion Models are Certifiably Robust Classifiers