๐ค AI Summary
This work addresses the limited generalization of existing unsupervised real-world image dehazing methods under complex haze conditions and the high cost of full-model fine-tuning. The authors reformulate dehazing as a semantic alignment problem in latent space and propose a reference-free, cross-modal unsupervised guidance mechanism. Specifically, they leverage CLIPโs textโimage alignment capability to construct a haze-to-clear textual guidance loss and introduce BiLaLoRA (Bilevel Localization LoRA), a novel dual-level localized adaptation module that automatically identifies optimal injection layers to jointly optimize both adapter parameters and their placement. The proposed method achieves state-of-the-art performance across multiple real-world dehazing benchmarks, with code publicly released.
๐ Abstract
Learning-based real image dehazing methods have achieved notable progress, yet they still face adaptation challenges in diverse real haze scenes. These challenges mainly stem from the lack of effective unsupervised mechanisms for unlabeled data and the heavy cost of full model fine-tuning. To address these challenges, we propose the haze-to-clear text-directed loss that leverages CLIP's cross-modal capabilities to reformulate real image dehazing as a semantic alignment problem in latent space, thereby providing explicit unsupervised cross-modal guidance in the absence of reference images. Furthermore, we introduce the Bilevel Layer-positioning LoRA (BiLaLoRA) strategy, which learns both the LoRA parameters and automatically search the injection layers, enabling targeted adaptation of critical network layers. Extensive experiments demonstrate our superiority against state-of-the-art methods on multiple real-world dehazing benchmarks. The code is publicly available at https://github.com/YanZhang-zy/BiLaLoRA.