Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing banner generation models achieve high visual fidelity but struggle to satisfy commercial design requirements—such as structured layout, precise typography, and brand consistency. To address this, we propose MIMO, an end-to-end generative framework leveraging multimodal agents and reflective optimization. MIMO establishes a hierarchical agent architecture integrating collaborative planning by multiple large language models (LLMs), multimodal understanding, diffusion-based image generation, and iterative reflective reasoning. Given only a natural-language prompt and a logo image, it automatically detects design flaws and optimizes layout, typography, and brand elements (e.g., color palette, logo placement). Evaluated on real-world advertising benchmarks, MIMO significantly outperforms state-of-the-art diffusion models and LLM-based baselines, achieving new SOTA performance in both visual quality and design compliance—including alignment, spacing, and brand-color consistency.

Technology Category

Application Category

📝 Abstract

Recent generative models such as GPT-4o have shown strong capabilities in producing high-quality images with accurate text rendering. However, commercial design tasks like advertising banners demand more than visual fidelity -- they require structured layouts, precise typography, consistent branding, and more. In this paper, we introduce MIMO (Mirror In-the-Model), an agentic refinement framework for automatic ad banner generation. MIMO combines a hierarchical multi-modal agent system (MIMO-Core) with a coordination loop (MIMO-Loop) that explores multiple stylistic directions and iteratively improves design quality. Requiring only a simple natural language based prompt and logo image as input, MIMO automatically detects and corrects multiple types of errors during generation. Experiments show that MIMO significantly outperforms existing diffusion and LLM-based baselines in real-world banner design scenarios.

Problem

Research questions and friction points this paper is trying to address.

Generating ad banners with structured layouts and branding

Improving design quality via multi-agent refinement

Correcting errors in automatic ad banner generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical multi-modal agent system for ad banners

Coordination loop exploring multiple stylistic directions

Automatic error detection and correction during generation

🔎 Similar Papers

No similar papers found.

Authors to Follow