🤖 AI Summary
Existing methods struggle to discover semantically coherent geodesic paths connecting distant visual concepts in nonlinear latent spaces, resulting in concept blends that lack semantic consistency and meaningfulness. To address this, we propose the “Vibe Blending” task and introduce the Vibe Space—a learnable graph manifold built upon CLIP features—that explicitly models geodesic connections between cross-domain concepts driven by shared semantic attributes (“vibes”). Our approach integrates multimodal alignment, manifold optimization, and a cognitively inspired path-difficulty scoring mechanism. Crucially, it is the first to jointly embed human creativity assessments and large language model (LLM)-based reasoning into the generative evaluation framework. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both creativity and semantic coherence of visual hybrids, validating the effectiveness and novelty of vibe-guided geometric semantic path modeling.
📝 Abstract
Creating new visual concepts often requires connecting distinct ideas through their most relevant shared attributes -- their vibe. We introduce Vibe Blending, a novel task for generating coherent and meaningful hybrids that reveals these shared attributes between images. Achieving such blends is challenging for current methods, which struggle to identify and traverse nonlinear paths linking distant concepts in latent space. We propose Vibe Space, a hierarchical graph manifold that learns low-dimensional geodesics in feature spaces like CLIP, enabling smooth and semantically consistent transitions between concepts. To evaluate creative quality, we design a cognitively inspired framework combining human judgments, LLM reasoning, and a geometric path-based difficulty score. We find that Vibe Space produces blends that humans consistently rate as more creative and coherent than current methods.