TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution

📅 2026-01-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of text image super-resolution in real-world scenarios, where poor text recovery and low background reconstruction quality stem primarily from the scarcity and limited diversity of authentic text samples in existing datasets. To overcome these limitations, we introduce Real-Texts, the first large-scale real-world text image dataset encompassing both Chinese and English scripts, and propose a TEXTS-Aware diffusion model. Our method integrates abstract semantic guidance with fine-grained modeling of local text regions within a unified framework to jointly optimize background visual fidelity and text legibility. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods across multiple metrics, effectively suppressing text distortion and hallucination while substantially improving text restoration accuracy and overall reconstruction quality in complex scenes, showcasing strong generalization capability.

Technology Category

Application Category

📝 Abstract

Real-world text image super-resolution aims to restore overall visual quality and text legibility in images suffering from diverse degradations and text distortions. However, the scarcity of text image data in existing datasets results in poor performance on text regions. In addition, datasets consisting of isolated text samples limit the quality of background reconstruction. To address these limitations, we construct Real-Texts, a large-scale, high-quality dataset collected from real-world images, which covers diverse scenarios and contains natural text instances in both Chinese and English. Additionally, we propose the TEXTS-Aware Diffusion Model (TEXTS-Diff) to achieve high-quality generation in both background and textual regions. This approach leverages abstract concepts to improve the understanding of textual elements within visual scenes and concrete text regions to enhance textual details. It mitigates distortions and hallucination artifacts commonly observed in text regions, while preserving high-quality visual scene fidelity. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple evaluation metrics, exhibiting superior generalization ability and text restoration accuracy in complex scenarios. All the code, model, and dataset will be released.

Problem

Research questions and friction points this paper is trying to address.

text image super-resolution

real-world degradation

text legibility

background reconstruction

text distortion

Innovation

Methods, ideas, or system contributions that make the work stand out.

text image super-resolution

diffusion model

real-world dataset

text-aware generation

hallucination mitigation

🔎 Similar Papers

No similar papers found.

Authors to Follow