TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution

📅 2026-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of text image super-resolution in real-world scenarios, where poor text recovery and low background reconstruction quality stem primarily from the scarcity and limited diversity of authentic text samples in existing datasets. To overcome these limitations, we introduce Real-Texts, the first large-scale real-world text image dataset encompassing both Chinese and English scripts, and propose a TEXTS-Aware diffusion model. Our method integrates abstract semantic guidance with fine-grained modeling of local text regions within a unified framework to jointly optimize background visual fidelity and text legibility. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods across multiple metrics, effectively suppressing text distortion and hallucination while substantially improving text restoration accuracy and overall reconstruction quality in complex scenes, showcasing strong generalization capability.

Technology Category

Application Category

📝 Abstract
Real-world text image super-resolution aims to restore overall visual quality and text legibility in images suffering from diverse degradations and text distortions. However, the scarcity of text image data in existing datasets results in poor performance on text regions. In addition, datasets consisting of isolated text samples limit the quality of background reconstruction. To address these limitations, we construct Real-Texts, a large-scale, high-quality dataset collected from real-world images, which covers diverse scenarios and contains natural text instances in both Chinese and English. Additionally, we propose the TEXTS-Aware Diffusion Model (TEXTS-Diff) to achieve high-quality generation in both background and textual regions. This approach leverages abstract concepts to improve the understanding of textual elements within visual scenes and concrete text regions to enhance textual details. It mitigates distortions and hallucination artifacts commonly observed in text regions, while preserving high-quality visual scene fidelity. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple evaluation metrics, exhibiting superior generalization ability and text restoration accuracy in complex scenarios. All the code, model, and dataset will be released.
Problem

Research questions and friction points this paper is trying to address.

text image super-resolution
real-world degradation
text legibility
background reconstruction
text distortion
Innovation

Methods, ideas, or system contributions that make the work stand out.

text image super-resolution
diffusion model
real-world dataset
text-aware generation
hallucination mitigation
🔎 Similar Papers
No similar papers found.
H
Haodong He
Amap, Alibaba Group
Xin Zhan
Xin Zhan
Machine Learning Engineer, Apple Inc.
Machine LearningComputer Architecture
Y
Yancheng Bai
Amap, Alibaba Group
R
Rui Lan
Amap, Alibaba Group
L
Lei Sun
Amap, Alibaba Group
X
Xiangxiang Chu
Amap, Alibaba Group