Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models

📅 2024-10-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Auditing implicit biases in text-to-image (T2I) models lacks human-interpretable explanatory mechanisms. To address this, we propose Concept2Concept—a novel framework that establishes the first explainable, vision-language concept–based conditional distribution representation paradigm, mapping prompt–image associations onto a semantically grounded concept space. Our method enables dual-path auditing: user-defined concepts and real-world distribution alignment, integrating concept-space modeling, quantitative conditional distribution analysis, and vision-language alignment evaluation. We release an open-source, interactive web tool built with D3.js and Gradio. Empirical evaluation across multiple state-of-the-art T2I models reveals significant conceptual biases—e.g., strong gender–occupation and race–scene associations. The tool has been widely adopted by the research community, with over 5,000 demo accesses to date.

Technology Category

Application Category

📝 Abstract

Text-to-image (T2I) models are increasingly used in impactful real-life applications. As such, there is a growing need to audit these models to ensure that they generate desirable, task-appropriate images. However, systematically inspecting the associations between prompts and generated content in a human-understandable way remains challenging. To address this, we propose Concept2Concept, a framework where we characterize conditional distributions of vision language models using interpretable concepts and metrics that can be defined in terms of these concepts. This characterization allows us to use our framework to audit models and prompt-datasets. To demonstrate, we investigate several case studies of conditional distributions of prompts, such as user-defined distributions or empirical, real-world distributions. Lastly, we implement Concept2Concept as an open-source interactive visualization tool to facilitate use by non-technical end-users. A demo is available at https://tinyurl.com/Concept2ConceptDemo.

Problem

Research questions and friction points this paper is trying to address.

Auditing text-to-image models

Understanding prompt-content associations

Developing interpretable visualization tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Characterizes vision language models

Audits models using interpretable concepts

Implements open-source visualization tool

🔎 Similar Papers

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts