🤖 AI Summary
Manual selection of diffusion timesteps in few-shot dense prediction introduces task bias and suboptimal performance. Method: This paper proposes a dual-module framework—Task-aware Timestep Selection (TTS) and Task-aware Feature Composition (TFC)—that for the first time models diffusion timesteps as learnable, task-oriented variables within a denoising diffusion probabilistic model. TTS employs timestep-wise loss-driven selection and cross-timestep feature similarity measurement, while TFC integrates multi-scale features via parameter-efficient adapter-based fine-tuning. Contribution/Results: Evaluated on the large-scale Taskonomy benchmark, our method significantly improves few-shot generalization across diverse dense prediction tasks. It demonstrates strong robustness and consistent cross-task performance on arbitrary unseen tasks, eliminating reliance on heuristic timestep scheduling.
📝 Abstract
Denoising diffusion probabilistic models have brought tremendous advances in generative tasks, achieving state-of-the-art performance thus far. Current diffusion model-based applications exploit the power of learned visual representations from multistep forward-backward Markovian processes for single-task prediction tasks by attaching a task-specific decoder. However, the heuristic selection of diffusion timestep features still heavily relies on empirical intuition, often leading to sub-optimal performance biased towards certain tasks. To alleviate this constraint, we investigate the significance of versatile diffusion timestep features by adaptively selecting timesteps best suited for the few-shot dense prediction task, evaluated on an arbitrary unseen task. To this end, we propose two modules: Task-aware Timestep Selection (TTS) to select ideal diffusion timesteps based on timestep-wise losses and similarity scores, and Timestep Feature Consolidation (TFC) to consolidate the selected timestep features to improve the dense predictive performance in a few-shot setting. Accompanied by our parameter-efficient fine-tuning adapter, our framework effectively achieves superiority in dense prediction performance given only a few support queries. We empirically validate our learnable timestep consolidation method on the large-scale challenging Taskonomy dataset for dense prediction, particularly for practical universal and few-shot learning scenarios.