🤖 AI Summary
Artistic style transfer has long suffered from the absence of real-world, style-free content images to serve as supervision signals. To address this, we propose a novel *destylization*-based paradigm and introduce DST-100K—the first large-scale, real-world style-content paired dataset. First, we design DST, a text-guided destylization model that generates intermediate representations preserving semantic content while disentangling stylistic attributes. Second, we develop DST-Filter, a multi-stage, chain-of-thought–driven intelligent filtering mechanism to ensure high data fidelity and diversity. Finally, we train OmniStyle2—a fully end-to-end style transfer model built upon the FLUX.1-dev feed-forward architecture—using DST-100K. Extensive experiments demonstrate that OmniStyle2 consistently outperforms state-of-the-art methods in both qualitative and quantitative evaluations, significantly improving photorealism and content preservation. Our results validate the effectiveness and scalability of the destylization-driven, data-centric paradigm for artistic style transfer.
📝 Abstract
OmniStyle2 introduces a novel approach to artistic style transfer by reframing it as a data problem. Our key insight is destylization, reversing style transfer by removing stylistic elements from artworks to recover natural, style-free counterparts. This yields DST-100K, a large-scale dataset that provides authentic supervision signals by aligning real artistic styles with their underlying content. To build DST-100K, we develop (1) DST, a text-guided destylization model that reconstructs stylefree content, and (2) DST-Filter, a multi-stage evaluation model that employs Chain-of-Thought reasoning to automatically discard low-quality pairs while ensuring content fidelity and style accuracy. Leveraging DST-100K, we train OmniStyle2, a simple feed-forward model based on FLUX.1-dev. Despite its simplicity, OmniStyle2 consistently surpasses state-of-the-art methods across both qualitative and quantitative benchmarks. Our results demonstrate that scalable data generation via destylization provides a reliable supervision paradigm, overcoming the fundamental challenge posed by the lack of ground-truth data in artistic style transfer.