Uni-DocDiff: A Unified Document Restoration Model Based on Diffusion

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing document restoration methods rely either on monolithic single-task models—leading to system bloat and poor scalability—or on unified models constrained by handcrafted prompts and fixed priors, limiting cross-task synergy. This paper proposes a diffusion-based multi-task joint restoration framework. Its core contributions are: (1) a learnable task prompt mechanism enabling adaptive task control; (2) a Prior Pool that dynamically stores multi-scale restoration priors; and (3) a Prior Fusion Module (PFM) that adaptively integrates local high-frequency and global low-frequency priors. The framework supports end-to-end restoration of diverse degradations—including text erasure, ink corruption, and crease removal—achieving state-of-the-art or comparable performance to specialized models across multiple benchmarks. Crucially, it enables zero-shot generalization to unseen tasks, significantly enhancing model generalizability and practical deployment efficiency.

Technology Category

Application Category

📝 Abstract
Removing various degradations from damaged documents greatly benefits digitization, downstream document analysis, and readability. Previous methods often treat each restoration task independently with dedicated models, leading to a cumbersome and highly complex document processing system. Although recent studies attempt to unify multiple tasks, they often suffer from limited scalability due to handcrafted prompts and heavy preprocessing, and fail to fully exploit inter-task synergy within a shared architecture. To address the aforementioned challenges, we propose Uni-DocDiff, a Unified and highly scalable Document restoration model based on Diffusion. Uni-DocDiff develops a learnable task prompt design, ensuring exceptional scalability across diverse tasks. To further enhance its multi-task capabilities and address potential task interference, we devise a novel extbf{Prior extbf{P}ool}, a simple yet comprehensive mechanism that combines both local high-frequency features and global low-frequency features. Additionally, we design the extbf{Prior extbf{F}usion extbf{M}odule (PFM)}, which enables the model to adaptively select the most relevant prior information for each specific task. Extensive experiments show that the versatile Uni-DocDiff achieves performance comparable or even superior performance compared with task-specific expert models, and simultaneously holds the task scalability for seamless adaptation to new tasks.
Problem

Research questions and friction points this paper is trying to address.

Unified model for diverse document restoration tasks
Overcomes limited scalability in existing multi-task methods
Addresses task interference via adaptive prior information fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified document restoration model using diffusion
Learnable task prompts for exceptional scalability
Prior Pool and Fusion Module for multi-task synergy
🔎 Similar Papers
No similar papers found.
F
Fangmin Zhao
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
Weichao Zeng
Weichao Zeng
Institute of Information Engineering, Chinese Academy of Sciences
Computer Vision
Zhenhang Li
Zhenhang Li
Institute of Information Engineering, CAS, China
computer visionimage generation
Dongbao Yang
Dongbao Yang
Institute of Information Engineering, Chinese Academy of Sciences
Computer Vision
B
Binbin Li
Institute of Information Engineering, Chinese Academy of Sciences
Xiaojun Bi
Xiaojun Bi
Department of Computer Science, Stony Brook University
Human Computer InteractionMobile User InterfacesText InputHuman Performance Models
Y
Yu Zhou
VCIP & TMCC & DISSec, College of Computer Science, Nankai University