TextWand: A Unified Framework for Scene Text Editing

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

170K/year
🤖 AI Summary
Existing approaches struggle to jointly handle scene text editing tasks—deletion, generation, and replacement—within a unified framework that simultaneously ensures precise textual appearance control and background integrity. To address this, this work proposes a unified model that decomposes complex text editing into two atomic operations: rendering and erasure. It introduces Overlay-Reference Positional Encoding (ORPE) to achieve pixel-level layout fidelity and exemplar-driven style control, complemented by a Region-Adaptive Suppression (RAS) strategy to ensure clean text removal. The study also establishes TextWand-Bench, the first comprehensive benchmark for general scene text editing. Experimental results demonstrate that the proposed method significantly outperforms both open-source and closed-source models across all three editing tasks in terms of text accuracy, layout-style consistency, and overall image quality.
📝 Abstract
We propose TextWand, a general-purpose framework that unifies scene text removal, generation, and replacement into a single model. By decomposing complex editing tasks into the atomic primitives of rendering and erasure, TextWand achieves precise control over both text appearance and background integrity. Specifically, we introduce a novel design, Overlay-Reference Positional Encoding (ORPE), to enforce pixel-level layout fidelity and exemplar-driven style control, alongside a new strategy, Region-Adaptive Suppression (RAS), to ensure clean text erasure. To address the absence of a comprehensive benchmark for general-purpose scene text editing among existing single-task datasets, we construct TextWand-Bench. Extensive experiments demonstrate that TextWand outperforms existing leading open-source and closed-source models by delivering superior text content accuracy, layout and style consistency, and overall image quality across scene text removal, generation and replacement tasks.
Problem

Research questions and friction points this paper is trying to address.

scene text editing
text removal
text generation
text replacement
unified framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

scene text editing
unified framework
Overlay-Reference Positional Encoding
Region-Adaptive Suppression
text erasure