MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Text-to-image diffusion models face significant bottlenecks in generating complex, multi-object prompts and achieving diverse stylistic outputs. Method: We propose the Multi-Expert Planning and Generation (MEPG) framework, which employs a location- and style-aware large language model (LLM) for fine-grained semantic instruction decomposition, coupled with spatial-semantic expert modules for joint layout planning and style synthesis. A novel dynamic expert routing mechanism and attention-based gating enable localized personalized generation while preserving global coherence. The method integrates LLM fine-tuning, a multi-expert diffusion architecture, and cross-region generation techniques, supporting high scalability and interactive editing. Contribution/Results: Extensive experiments demonstrate that, under identical backbone models, MEPG substantially improves structural accuracy and stylistic diversity of generated images, consistently outperforming state-of-the-art baselines across quantitative and qualitative evaluations.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models have achieved remarkable image quality, but they still struggle with complex, multiele ment prompts, and limited stylistic diversity. To address these limitations, we propose a Multi-Expert Planning and Gen eration Framework (MEPG) that synergistically integrates position- and style-aware large language models (LLMs) with spatial-semantic expert modules. The framework comprises two core components: (1) a Position-Style-Aware (PSA) module that utilizes a supervised fine-tuned LLM to decom pose input prompts into precise spatial coordinates and style encoded semantic instructions; and (2) a Multi-Expert Dif fusion (MED) module that implements cross-region genera tion through dynamic expert routing across both local regions and global areas. During the generation process for each lo cal region, specialized models (e.g., realism experts, styliza tion specialists) are selectively activated for each spatial par tition via attention-based gating mechanisms. The architec ture supports lightweight integration and replacement of ex pert models, providing strong extensibility. Additionally, an interactive interface enables real-time spatial layout editing and per-region style selection from a portfolio of experts. Ex periments show that MEPG significantly outperforms base line models with the same backbone in both image quality and style diversity.

Problem

Research questions and friction points this paper is trying to address.

Addresses complex multi-element prompt challenges in image generation

Enhances stylistic diversity in text-to-image diffusion models

Improves spatial layout precision through expert module integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Expert Planning and Generation Framework

Position-Style-Aware LLM for prompt decomposition

Multi-Expert Diffusion with dynamic expert routing

🔎 Similar Papers

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation