Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative image models often fail to accurately render specific visual concepts—such as hand anatomy or quaternary object arrangements—even when trained on abundant data; the underlying causes of such “conceptual blind spots” remain poorly understood. This paper introduces RA-SAE, the first theoretical framework grounded in sparse autoencoders, which leverages DINOv2 features to construct a fine-grained analyzer comprising 32,000 human-interpretable concepts. It systematically quantifies conceptual distribution discrepancies between real and generated images. Our method is the first to distinguish *suppressive* from *exaggerative* blind spots and enables instance-level identification of memorization artifacts. Evaluated on Stable Diffusion, PixArt, and Kandinsky, RA-SAE successfully pinpoints concrete blind-spot concepts—including bird feeders and DVD discs—and significantly enhances the diagnosability and scalability of conceptual fidelity assessment for generative models.

Technology Category

Application Category

📝 Abstract
Despite their impressive performance, generative image models trained on large-scale datasets frequently fail to produce images with seemingly simple concepts -- e.g., human hands or objects appearing in groups of four -- that are reasonably expected to appear in the training data. These failure modes have largely been documented anecdotally, leaving open the question of whether they reflect idiosyncratic anomalies or more structural limitations of these models. To address this, we introduce a systematic approach for identifying and characterizing "conceptual blindspots" -- concepts present in the training data but absent or misrepresented in a model's generations. Our method leverages sparse autoencoders (SAEs) to extract interpretable concept embeddings, enabling a quantitative comparison of concept prevalence between real and generated images. We train an archetypal SAE (RA-SAE) on DINOv2 features with 32,000 concepts -- the largest such SAE to date -- enabling fine-grained analysis of conceptual disparities. Applied to four popular generative models (Stable Diffusion 1.5/2.1, PixArt, and Kandinsky), our approach reveals specific suppressed blindspots (e.g., bird feeders, DVD discs, and whitespaces on documents) and exaggerated blindspots (e.g., wood background texture and palm trees). At the individual datapoint level, we further isolate memorization artifacts -- instances where models reproduce highly specific visual templates seen during training. Overall, we propose a theoretically grounded framework for systematically identifying conceptual blindspots in generative models by assessing their conceptual fidelity with respect to the underlying data-generating process.
Problem

Research questions and friction points this paper is trying to address.

Identifying conceptual blindspots in generative image models
Quantifying disparities between real and generated image concepts
Detecting memorization artifacts in model outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sparse autoencoders for interpretable concept embeddings
Trains largest SAE with 32,000 DINOv2 concepts
Quantifies conceptual disparities in generative models
🔎 Similar Papers
No similar papers found.