Z-SASLM: Zero-Shot Style-Aligned SLI Blending Latent Manipulation

📅 2025-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-style fusion methods rely on linear interpolation, neglecting the non-Euclidean geometric structure of latent spaces, which leads to style distortion and semantic inconsistency. To address this, we propose a zero-shot, fine-tuning-free Spherical Linear Interpolation (SLI) framework for style fusion, modeling style vectors as points on a sphere and enforcing geodesic relationships within the DINO ViT-B/8 feature space to achieve geometry-preserving latent-space alignment. We further introduce a novel weighted multi-style DINO evaluation metric that enables adaptive style weight assignment based on perceptual relevance. Experiments demonstrate that our method significantly improves visual fidelity and cross-style semantic consistency of fused outputs, while exhibiting robust style alignment across multiple benchmarks—including StyleMix, FFHQ-Style, and CelebA-HQ—without requiring architectural modifications or additional training.

Technology Category

Application Category

📝 Abstract
We introduce Z-SASLM, a Zero-Shot Style-Aligned SLI (Spherical Linear Interpolation) Blending Latent Manipulation pipeline that overcomes the limitations of current multi-style blending methods. Conventional approaches rely on linear blending, assuming a flat latent space leading to suboptimal results when integrating multiple reference styles. In contrast, our framework leverages the non-linear geometry of the latent space by using SLI Blending to combine weighted style representations. By interpolating along the geodesic on the hypersphere, Z-SASLM preserves the intrinsic structure of the latent space, ensuring high-fidelity and coherent blending of diverse styles - all without the need for fine-tuning. We further propose a new metric, Weighted Multi-Style DINO ViT-B/8, designed to quantitatively evaluate the consistency of the blended styles. While our primary focus is on the theoretical and practical advantages of SLI Blending for style manipulation, we also demonstrate its effectiveness in a multi-modal content fusion setting through comprehensive experimental studies. Experimental results show that Z-SASLM achieves enhanced and robust style alignment. The implementation code can be found at: https://github.com/alessioborgi/Z-SASLM.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of linear blending in multi-style fusion
Ensuring high-fidelity style blending without fine-tuning
Proposing a metric to evaluate blended style consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SLI Blending for non-linear style combination
Interpolates along hypersphere geodesic for fidelity
Introduces Weighted Multi-Style DINO ViT-B/8 metric
🔎 Similar Papers
No similar papers found.