A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image diffusion models suffer from limited controllability due to weak representational capacity of textual conditioning signals. To address this, we propose Visual Concept Mining (VCM) as a systematic research framework, introducing a four-dimensional taxonomy—concept learning, erasure, decomposition, and composition—to unify cross-method principles for the first time, and explicitly identify editability, generalizability, and interpretability as three core challenges and future directions. Methodologically, we integrate personalized fine-tuning, attention rewighting, latent-space disentanglement, and reference-image guidance to enable multi-granularity visual concept modeling within diffusion pipelines. We further construct the first VCM knowledge graph—a structured, systematic resource—that provides theoretical foundations and practical guidelines for controllable generation. This work advances text-to-image synthesis from purely “text-driven” generation toward “text–vision co-driven” generation.

Technology Category

Application Category

📝 Abstract
Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts. However, the inherent limitations of textual signals often prevent these models from fully capturing specific concepts, thereby reducing their controllability. To address this issue, several approaches have incorporated personalization techniques, utilizing reference images to mine visual concept representations that complement textual inputs and enhance the controllability of text-to-image diffusion models. Despite these advances, a comprehensive, systematic exploration of visual concept mining remains limited. In this paper, we categorize existing research into four key areas: Concept Learning, Concept Erasing, Concept Decomposition, and Concept Combination. This classification provides valuable insights into the foundational principles of Visual Concept Mining (VCM) techniques. Additionally, we identify key challenges and propose future research directions to propel this important and interesting field forward.
Problem

Research questions and friction points this paper is trying to address.

Enhancing controllability in text-to-image diffusion models
Exploring visual concept mining for better image generation
Addressing limitations of textual signals in concept capture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes reference images for visual concept mining
Enhances controllability in text-to-image diffusion models
Categorizes research into four key VCM areas
🔎 Similar Papers
No similar papers found.
Ziqiang Li
Ziqiang Li
Associate Professor, Nanjing University of Information Sciences and Technology
AIGCBackdoor LearningAI Security
J
Jun Li
School of Computer Science, Nanjing University of Information Science and Technology
Lizhi Xiong
Lizhi Xiong
Nanjing University of Information Science & Technology
AI SecurityMultimedia Processing
Z
Zhangjie Fu
School of Computer Science, Nanjing University of Information Science and Technology
Z
Zechao Li
School of Computer Science, Nanjing University of Science and Technology