🤖 AI Summary
Text-to-image diffusion models suffer from limited controllability due to weak representational capacity of textual conditioning signals. To address this, we propose Visual Concept Mining (VCM) as a systematic research framework, introducing a four-dimensional taxonomy—concept learning, erasure, decomposition, and composition—to unify cross-method principles for the first time, and explicitly identify editability, generalizability, and interpretability as three core challenges and future directions. Methodologically, we integrate personalized fine-tuning, attention rewighting, latent-space disentanglement, and reference-image guidance to enable multi-granularity visual concept modeling within diffusion pipelines. We further construct the first VCM knowledge graph—a structured, systematic resource—that provides theoretical foundations and practical guidelines for controllable generation. This work advances text-to-image synthesis from purely “text-driven” generation toward “text–vision co-driven” generation.
📝 Abstract
Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts. However, the inherent limitations of textual signals often prevent these models from fully capturing specific concepts, thereby reducing their controllability. To address this issue, several approaches have incorporated personalization techniques, utilizing reference images to mine visual concept representations that complement textual inputs and enhance the controllability of text-to-image diffusion models. Despite these advances, a comprehensive, systematic exploration of visual concept mining remains limited. In this paper, we categorize existing research into four key areas: Concept Learning, Concept Erasing, Concept Decomposition, and Concept Combination. This classification provides valuable insights into the foundational principles of Visual Concept Mining (VCM) techniques. Additionally, we identify key challenges and propose future research directions to propel this important and interesting field forward.