Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing banner generation models achieve high visual fidelity but struggle to satisfy commercial design requirements—such as structured layout, precise typography, and brand consistency. To address this, we propose MIMO, an end-to-end generative framework leveraging multimodal agents and reflective optimization. MIMO establishes a hierarchical agent architecture integrating collaborative planning by multiple large language models (LLMs), multimodal understanding, diffusion-based image generation, and iterative reflective reasoning. Given only a natural-language prompt and a logo image, it automatically detects design flaws and optimizes layout, typography, and brand elements (e.g., color palette, logo placement). Evaluated on real-world advertising benchmarks, MIMO significantly outperforms state-of-the-art diffusion models and LLM-based baselines, achieving new SOTA performance in both visual quality and design compliance—including alignment, spacing, and brand-color consistency.

Technology Category

Application Category

📝 Abstract
Recent generative models such as GPT-4o have shown strong capabilities in producing high-quality images with accurate text rendering. However, commercial design tasks like advertising banners demand more than visual fidelity -- they require structured layouts, precise typography, consistent branding, and more. In this paper, we introduce MIMO (Mirror In-the-Model), an agentic refinement framework for automatic ad banner generation. MIMO combines a hierarchical multi-modal agent system (MIMO-Core) with a coordination loop (MIMO-Loop) that explores multiple stylistic directions and iteratively improves design quality. Requiring only a simple natural language based prompt and logo image as input, MIMO automatically detects and corrects multiple types of errors during generation. Experiments show that MIMO significantly outperforms existing diffusion and LLM-based baselines in real-world banner design scenarios.
Problem

Research questions and friction points this paper is trying to address.

Generating ad banners with structured layouts and branding
Improving design quality via multi-agent refinement
Correcting errors in automatic ad banner generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical multi-modal agent system for ad banners
Coordination loop exploring multiple stylistic directions
Automatic error detection and correction during generation
🔎 Similar Papers
No similar papers found.
Z
Zhao Wang
Sony Group Corporation, Japan
B
Bowen Chen
Sony Group Corporation, Japan; The University of Tokyo, Japan
Y
Yotaro Shimose
Sony Group Corporation, Japan
S
Sota Moriyama
Sony Group Corporation, Japan; The Graduate University for Advanced Studies, Japan
H
Heng Wang
Sony Group Corporation, Japan
Shingo Takamatsu
Shingo Takamatsu
Sony Group Corporation