🤖 AI Summary
This paper systematically defines and empirically investigates the “default image” phenomenon in text-to-image (TTI) generation—where models produce highly similar, redundant outputs for unknown, semantically ambiguous, or neologistic prompts. Conducting black-box interaction experiments on Midjourney, we employ manually crafted invalid prompts, quantitative image similarity analysis (e.g., CLIP-based embedding distance), and a structured user survey with statistical hypothesis testing (t-tests, ANOVA). Our key contributions are: (1) a reproducible methodology to trigger default images; (2) empirical confirmation of their high cross-prompt consistency; (3) statistically significant evidence that default images substantially degrade user-perceived credibility and practical utility of generated outputs (p < 0.01); and (4) establishment of a novel benchmark and research agenda for prompt robustness, model interpretability, and controllable generation in TTI systems.
📝 Abstract
In the creative practice of text-to-image generation (TTI), images are generated from text prompts. However, TTI models are trained to always yield an output, even if the prompt contains unknown terms. In this case, the model may generate what we call"default images": images that closely resemble each other across many unrelated prompts. We argue studying default images is valuable for designing better solutions for TTI and prompt engineering. In this paper, we provide the first investigation into default images on Midjourney, a popular image generator. We describe our systematic approach to create input prompts triggering default images, and present the results of our initial experiments and several small-scale ablation studies. We also report on a survey study investigating how default images affect user satisfaction. Our work lays the foundation for understanding default images in TTI and highlights challenges and future research directions.