🤖 AI Summary
This work addresses linear inverse problems—including image inpainting, deblurring, and super-resolution—by proposing a diffusion posterior sampling framework grounded in Conditional Mutual Information Maximization (CIM). The core methodological innovation lies in explicitly optimizing the conditional mutual information $I(x_0; y mid x_t)$ at each denoising step, thereby enforcing faithful preservation of task-relevant information about the ground-truth $x_0$ within the latent variable $x_t$, without relying on approximate likelihood estimation that often induces reconstruction artifacts. Unlike prior approaches, CIM requires no task-specific fine-tuning or auxiliary network design, and seamlessly integrates with standard diffusion samplers such as DDIM and PNDM. Extensive experiments across multiple inverse problem benchmarks demonstrate consistent and significant improvements in PSNR and SSIM, yielding reconstructions with enhanced structural fidelity and sharper texture details. The framework thus unifies theoretical rigor—via information-theoretic grounding—with practical plug-and-play applicability.
📝 Abstract
Inverse problems are prevalent across various disciplines in science and engineering. In the field of computer vision, tasks such as inpainting, deblurring, and super-resolution are commonly formulated as inverse problems. Recently, diffusion models (DMs) have emerged as a promising approach for addressing noisy linear inverse problems, offering effective solutions without requiring additional task-specific training. Specifically, with the prior provided by DMs, one can sample from the posterior by finding the likelihood. Since the likelihood is intractable, it is often approximated in the literature. However, this approximation compromises the quality of the generated images. To overcome this limitation and improve the effectiveness of DMs in solving inverse problems, we propose an information-theoretic approach. Specifically, we maximize the conditional mutual information $mathrm{I}(oldsymbol{x}_0; oldsymbol{y} | oldsymbol{x}_t)$, where $oldsymbol{x}_0$ represents the reconstructed signal, $oldsymbol{y}$ is the measurement, and $oldsymbol{x}_t$ is the intermediate signal at stage $t$. This ensures that the intermediate signals $oldsymbol{x}_t$ are generated in a way that the final reconstructed signal $oldsymbol{x}_0$ retains as much information as possible about the measurement $oldsymbol{y}$. We demonstrate that this method can be seamlessly integrated with recent approaches and, once incorporated, enhances their performance both qualitatively and quantitatively.