🤖 AI Summary
Existing concept bottleneck models (CBMs) rely on auxiliary visual cues to compensate for missing concept information during generation, undermining interpretability and compositional reasoning. This paper introduces CoCo-Bot—the first posterior compositional CBM for generative modeling that operates without auxiliary visual inputs. Its core innovation is an energy-based conceptual space generator, integrating StyleGAN2’s structural priors with diffusion-guided optimization to enable purely concept-driven image synthesis. CoCo-Bot supports robust, human-interpretable interventions—including arbitrary cross-concept composition and logical negation—ensuring all generative information flows exclusively through semantically meaningful, human-understandable concepts. Evaluated on CelebA-HQ, CoCo-Bot achieves state-of-the-art image fidelity while significantly improving concept-level controllability, interpretability, and editing flexibility. It is the first framework to realize truly end-to-end, intervenable, and compositionally expressive concept bottleneck generation.
📝 Abstract
Concept Bottleneck Models (CBMs) provide interpretable and controllable generative modeling by routing generation through explicit, human-understandable concepts. However, previous generative CBMs often rely on auxiliary visual cues at the bottleneck to compensate for information not captured by the concepts, which undermines interpretability and compositionality. We propose CoCo-Bot, a post-hoc, composable concept bottleneck generative model that eliminates the need for auxiliary cues by transmitting all information solely through explicit concepts. Guided by diffusion-based energy functions, CoCo-Bot supports robust post-hoc interventions-such as concept composition and negation-across arbitrary concepts. Experiments using StyleGAN2 pre-trained on CelebA-HQ show that CoCo-Bot improves concept-level controllability and interpretability, while maintaining competitive visual quality.