🤖 AI Summary
This work addresses the need for automated generation of compliant, high-fidelity, and creatively diverse images in brand marketing contexts by proposing the first fully automatic text-to-image generation pipeline that jointly optimizes brand safety, image quality, and human preference. The system integrates state-of-the-art text-to-image models with a DINOv2-based image quality assessment module and incorporates a lightweight human feedback mechanism to dynamically refine outputs. Experimental results demonstrate that, compared to baseline methods, the proposed approach improves image fidelity by 30.77% (as measured by DINOv2) and increases human preference by 52.00%, effectively balancing automation efficiency with brand compliance in large-scale production environments.
📝 Abstract
Text-to-image models have made significant strides, producing impressive results in generating images from textual descriptions. However, creating a scalable pipeline for deploying these models in production remains a challenge. Achieving the right balance between automation and human feedback is critical to maintain both scale and quality. While automation can handle large volumes, human oversight is still an essential component to ensure that the generated images meet the desired standards and are aligned with the creative vision. This paper presents a new pipeline that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models. The proposed system maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines. By streamlining this process, we ensure a seamless blend of efficiency and human oversight, achieving a $30.77\%$ increase in marketing object fidelity using DINOV2 and a $52.00\%$ increase in human preference over the generated outcome.