🤖 AI Summary
Existing art image generation methods are constrained by single-style inputs—such as a single reference image or domain-specific text—resulting in coarse-grained and inflexible style control. This work introduces the first end-to-end text-driven artistic image generation framework enabling multi-granular style disentanglement: a tunable style encoder accepts arbitrary textual style descriptions and independently modulates structural layout, fine-grained details, and global style intensity. We further propose the first art-guided super-resolution module, which injects painter-level brushstroke and texture features into diffusion outputs. Built upon a text-conditioned diffusion backbone, our method integrates a lightweight super-resolution network and an accelerated sampling strategy. Extensive evaluations demonstrate state-of-the-art performance across FID, LPIPS, and human assessments—achieving superior generation quality, diversity, and real-time inference capability.
📝 Abstract
In this work, we propose a complete framework that generates visual art. Unlike previous stylization methods that are not flexible with style parameters (i.e., they allow stylization with only one style image, a single stylization text or stylization of a content image from a certain domain), our method has no such restriction. In addition, we implement an improved version that can generate a wide range of results with varying degrees of detail, style and structure, with a boost in generation speed. To further enhance the results, we insert an artistic super-resolution module in the generative pipeline. This module will bring additional details such as patterns specific to painters, slight brush marks, and so on.