TextWand: A Unified Framework for Scene Text Editing

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing approaches struggle to jointly handle scene text editing tasks—deletion, generation, and replacement—within a unified framework that simultaneously ensures precise textual appearance control and background integrity. To address this, this work proposes a unified model that decomposes complex text editing into two atomic operations: rendering and erasure. It introduces Overlay-Reference Positional Encoding (ORPE) to achieve pixel-level layout fidelity and exemplar-driven style control, complemented by a Region-Adaptive Suppression (RAS) strategy to ensure clean text removal. The study also establishes TextWand-Bench, the first comprehensive benchmark for general scene text editing. Experimental results demonstrate that the proposed method significantly outperforms both open-source and closed-source models across all three editing tasks in terms of text accuracy, layout-style consistency, and overall image quality.

📝 Abstract

We propose TextWand, a general-purpose framework that unifies scene text removal, generation, and replacement into a single model. By decomposing complex editing tasks into the atomic primitives of rendering and erasure, TextWand achieves precise control over both text appearance and background integrity. Specifically, we introduce a novel design, Overlay-Reference Positional Encoding (ORPE), to enforce pixel-level layout fidelity and exemplar-driven style control, alongside a new strategy, Region-Adaptive Suppression (RAS), to ensure clean text erasure. To address the absence of a comprehensive benchmark for general-purpose scene text editing among existing single-task datasets, we construct TextWand-Bench. Extensive experiments demonstrate that TextWand outperforms existing leading open-source and closed-source models by delivering superior text content accuracy, layout and style consistency, and overall image quality across scene text removal, generation and replacement tasks.

Problem

Research questions and friction points this paper is trying to address.

scene text editing

text removal

text generation

text replacement

unified framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

scene text editing

unified framework

Overlay-Reference Positional Encoding