UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

📅 2026-02-02

📈 Citations: 1

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing unified multimodal models exhibit limited performance on image generation and editing tasks that require deep reasoning, often treating the two tasks in isolation. This work proposes UniReason, a novel framework that, for the first time, formulates generation and editing as a coherent “plan–refine” reasoning process, unifying world-knowledge-enhanced textual reasoning with self-reflective visual refinement. We construct a reasoning dataset spanning five knowledge domains and a proxy-generated visual refinement corpus, and design a unified multitask architecture to support this paradigm. The proposed method achieves state-of-the-art performance on reasoning-intensive benchmarks—including WISE, KrisBench, and UniREditBench—while preserving strong general-purpose image synthesis capabilities.

Technology Category

Application Category

📝 Abstract

Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than interconnected reasoning steps. To address this, we propose UniReason, a unified framework that harmonizes these two tasks through two complementary reasoning paradigms. We incorporate world knowledge-enhanced textual reasoning into generation to infer implicit knowledge, and leverage editing capabilities for fine-grained editing-like visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared architecture, mirroring the human cognitive process of planning followed by refinement. We support this framework by systematically constructing a large-scale reasoning-centric dataset (~300k samples) covering five major knowledge domains (e.g., cultural commonsense, physics, etc.) for textual reasoning, alongside an agent-generated corpus for visual refinement. Extensive experiments demonstrate that UniReason achieves advanced performance on reasoning-intensive benchmarks such as WISE, KrisBench and UniREditBench, while maintaining superior general synthesis capabilities.

Problem

Research questions and friction points this paper is trying to address.

multimodal reasoning

image generation

image editing

world knowledge

unified framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified reasoning

world knowledge alignment

text-to-image generation