LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
General image restoration (UIR) must handle unknown mixed degradations while preserving semantic structure, yet existing methods suffer from over-smoothing, hallucination, or semantic drift. To address this, we propose the first text-prompt-free diffusion Transformer framework. Our method introduces a lightweight dual-branch conditional injection module to jointly model degradation and semantic features; a timestep-layer adaptive modulation mechanism that dynamically calibrates structural and semantic priors at each diffusion layer; and SigLIP-based feature alignment coupled with a scalable data filtering strategy. Extensive experiments on both synthetic and real-world degradation benchmarks demonstrate consistent superiority over state-of-the-art open-source and commercial models. Ablation studies confirm the necessity and efficacy of each component, yielding significant improvements in restoration quality, structural fidelity, and inference stability.

Technology Category

Application Category

📝 Abstract
Universal image restoration (UIR) aims to recover images degraded by unknown mixtures while preserving semantics -- conditions under which discriminative restorers and UNet-based diffusion priors often oversmooth, hallucinate, or drift. We present LucidFlux, a caption-free UIR framework that adapts a large diffusion transformer (Flux.1) without image captions. LucidFlux introduces a lightweight dual-branch conditioner that injects signals from the degraded input and a lightly restored proxy to respectively anchor geometry and suppress artifacts. Then, a timestep- and layer-adaptive modulation schedule is designed to route these cues across the backbone's hierarchy, in order to yield coarse-to-fine and context-aware updates that protect the global structure while recovering texture. After that, to avoid the latency and instability of text prompts or MLLM captions, we enforce caption-free semantic alignment via SigLIP features extracted from the proxy. A scalable curation pipeline further filters large-scale data for structure-rich supervision. Across synthetic and in-the-wild benchmarks, LucidFlux consistently outperforms strong open-source and commercial baselines, and ablation studies verify the necessity of each component. LucidFlux shows that, for large DiTs, when, where, and what to condition on -- rather than adding parameters or relying on text prompts -- is the governing lever for robust and caption-free universal image restoration in the wild.
Problem

Research questions and friction points this paper is trying to address.

Universal image restoration without captions
Preventing oversmoothing and hallucination artifacts
Anchoring geometry while recovering texture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large diffusion transformer adapted without captions
Dual-branch conditioner anchors geometry and suppresses artifacts
Timestep-layer modulation enables coarse-to-fine context-aware updates
🔎 Similar Papers
2024-07-04IEEE transactions on circuits and systems for video technology (Print)Citations: 1
Song Fei
Song Fei
The Hong Kong University of Science and Technology (Guangzhou)
Low-Level Vision
T
Tian Ye
The Hong Kong University of Science and Technology (Guangzhou)
L
Lujia Wang
The Hong Kong University of Science and Technology (Guangzhou)
L
Lei Zhu
The Hong Kong University of Science and Technology The Hong Kong University of Science and Technology (Guangzhou)