🤖 AI Summary
Current AIGC systems struggle to accurately model and align with users’ personalized preferences, hindered by the absence of unified preference data, modeling paradigms, and evaluation protocols. To address this, we propose MagicWand—the first general-purpose generative agent explicitly designed for user preference alignment. Our approach introduces UniPrefer-100K, a large-scale cross-modal preference dataset, and UniPreferBench, a corresponding benchmark. We further design an integrated framework combining prompt enhancement, preference-driven generation, and automatic refinement. The method supports multimodal tasks—including text and image generation—while preserving output quality and significantly improving preference consistency. Experiments demonstrate that MagicWand consistently outperforms all baselines on UniPreferBench, enabling precise preference modeling, controllable generation, and trustworthy evaluation. This work establishes a scalable technical paradigm for personalized AIGC.
📝 Abstract
Recent advances in AIGC (Artificial Intelligence Generated Content) models have enabled significant progress in image and video generation. However, users still struggle to obtain content that aligns with their preferences due to the difficulty of crafting detailed prompts and the lack of mechanisms to retain their preferences. To address these challenges, we construct extbf{UniPrefer-100K}, a large-scale dataset comprising images, videos, and associated text that describes the styles users tend to prefer. Based on UniPrefer-100K, we propose extbf{MagicWand}, a universal generation and evaluation agent that enhances prompts based on user preferences, leverages advanced generation models for high-quality content, and applies preference-aligned evaluation and refinement. In addition, we introduce extbf{UniPreferBench}, the first large-scale benchmark with over 120K annotations for assessing user preference alignment across diverse AIGC tasks. Experiments on UniPreferBench demonstrate that MagicWand consistently generates content and evaluations that are well aligned with user preferences across a wide range of scenarios.