FonTS: Text Rendering with Typography and Style Controls

📅 2024-11-28

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

143K/year

🤖 AI Summary

Existing diffusion Transformer (DiT)-based text-to-image models suffer from inconsistent, drifting, and coarse-grained control over word-level typography, layout, and style. To address this, we propose the first word-level controllable text rendering framework. Our method introduces enclosing typography control tokens (ETC-tokens) for fine-grained layout modeling; parameter-efficient typography control fine-tuning (TC-FT), which updates only 5% of model parameters; and a text-agnostic style control adapter (SCA). We further construct HTML-render, the first large-scale, word-level annotated controllable dataset, synthesized via HTML-based rendering. Experiments demonstrate significant improvements in word-level font consistency, layout controllability, and style stability, outperforming state-of-the-art methods across multiple text rendering metrics. All code, models, and the HTML-render dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Visual text rendering are widespread in various real-world applications, requiring careful font selection and typographic choices. Recent progress in diffusion transformer (DiT)-based text-to-image (T2I) models show promise in automating these processes. However, these methods still encounter challenges like inconsistent fonts, style variation, and limited fine-grained control, particularly at the word-level. This paper proposes a two-stage DiT-based pipeline to address these problems by enhancing controllability over typography and style in text rendering. We introduce typography control fine-tuning (TC-FT), an parameter-efficient fine-tuning method (on $5%$ key parameters) with enclosing typography control tokens (ETC-tokens), which enables precise word-level application of typographic features. To further address style inconsistency in text rendering, we propose a text-agnostic style control adapter (SCA) that prevents content leakage while enhancing style consistency. To implement TC-FT and SCA effectively, we incorporated HTML-render into the data synthesis pipeline and proposed the first word-level controllable dataset. Through comprehensive experiments, we demonstrate the effectiveness of our approach in achieving superior word-level typographic control, font consistency, and style consistency in text rendering tasks. The datasets and models will be available for academic use.

Problem

Research questions and friction points this paper is trying to address.

Enhances word-level typographic control in text rendering.

Improves font consistency across text rendering applications.

Ensures style consistency without content leakage in text rendering.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage DiT pipeline for text rendering

Typography control fine-tuning with ETC-tokens

Text-agnostic style control adapter (SCA)

🔎 Similar Papers

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis