🤖 AI Summary
This work addresses the challenge of generating multilingual logos with high fidelity and visual appeal, a task where existing text-to-image methods often distort character structures and struggle to generalize to unseen languages. The authors propose a training-free, character-aware generative framework that treats target characters as image inputs and integrates their geometric structure with visual style through a multimodal diffusion Transformer. Key innovations include joint attention analysis, cross-layer attention map aggregation, and core token injection, enabling precise fusion of typographic form and design aesthetics. This approach achieves, for the first time, high-quality logo generation across arbitrary languages without additional training. Extensive user studies and quantitative evaluations demonstrate its superior performance in both character fidelity and design quality compared to state-of-the-art methods.
📝 Abstract
Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existing methods often distort character geometry when applying creative styles and struggle to support multilingual text generation without additional training. To address these challenges, we propose LogoDiffuser, a training-free method that synthesizes multilingual logo designs using the multimodal diffusion transformer. Instead of using textual prompts, we input the target characters as images, enabling robust character structure control regardless of language. We first analyze the joint attention mechanism to identify core tokens, which are tokens that strongly respond to textual structures. With this observation, our method integrates character structure and visual design by injecting the most informative attention maps. Furthermore, we perform layer-wise aggregation of attention maps to mitigate attention shifts across layers and obtain consistent core tokens. Extensive experiments and user studies demonstrate that our method achieves state-of-the-art performance in multilingual logo generation.