A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Text-to-image diffusion models suffer from limited controllability due to weak representational capacity of textual conditioning signals. To address this, we propose Visual Concept Mining (VCM) as a systematic research framework, introducing a four-dimensional taxonomy—concept learning, erasure, decomposition, and composition—to unify cross-method principles for the first time, and explicitly identify editability, generalizability, and interpretability as three core challenges and future directions. Methodologically, we integrate personalized fine-tuning, attention rewighting, latent-space disentanglement, and reference-image guidance to enable multi-granularity visual concept modeling within diffusion pipelines. We further construct the first VCM knowledge graph—a structured, systematic resource—that provides theoretical foundations and practical guidelines for controllable generation. This work advances text-to-image synthesis from purely “text-driven” generation toward “text–vision co-driven” generation.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts. However, the inherent limitations of textual signals often prevent these models from fully capturing specific concepts, thereby reducing their controllability. To address this issue, several approaches have incorporated personalization techniques, utilizing reference images to mine visual concept representations that complement textual inputs and enhance the controllability of text-to-image diffusion models. Despite these advances, a comprehensive, systematic exploration of visual concept mining remains limited. In this paper, we categorize existing research into four key areas: Concept Learning, Concept Erasing, Concept Decomposition, and Concept Combination. This classification provides valuable insights into the foundational principles of Visual Concept Mining (VCM) techniques. Additionally, we identify key challenges and propose future research directions to propel this important and interesting field forward.

Problem

Research questions and friction points this paper is trying to address.

Enhancing controllability in text-to-image diffusion models

Exploring visual concept mining for better image generation

Addressing limitations of textual signals in concept capture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes reference images for visual concept mining

Enhances controllability in text-to-image diffusion models

Categorizes research into four key VCM areas

🔎 Similar Papers

No similar papers found.

Authors to Follow