WP-CLIP: Leveraging CLIP to Predict Wölfflin's Principles in Visual Art

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the lack of quantitative evaluation tools for Wolfflin’s five classical art style principles (e.g., linear vs. painterly, flat vs. deep). We propose the first automatic prediction framework based on vision-language models. Methodologically, we adapt CLIP via supervised fine-tuning for fine-grained style scoring, integrating prompt engineering and continuous-score regression to enable interpretable modeling of abstract formal attributes. Evaluated on the Pandora-18K real-painting dataset and diverse GAN-generated images, our model significantly outperforms baselines, achieving state-of-the-art performance across all five principles while demonstrating strong cross-domain generalization. This work establishes the first reproducible, scalable benchmark framework for computational art history and AI-driven formal analysis.

Technology Category

Application Category

📝 Abstract

Wölfflin's five principles offer a structured approach to analyzing stylistic variations for formal analysis. However, no existing metric effectively predicts all five principles in visual art. Computationally evaluating the visual aspects of a painting requires a metric that can interpret key elements such as color, composition, and thematic choices. Recent advancements in vision-language models (VLMs) have demonstrated their ability to evaluate abstract image attributes, making them promising candidates for this task. In this work, we investigate whether CLIP, pre-trained on large-scale data, can understand and predict Wölfflin's principles. Our findings indicate that it does not inherently capture such nuanced stylistic elements. To address this, we fine-tune CLIP on annotated datasets of real art images to predict a score for each principle. We evaluate our model, WP-CLIP, on GAN-generated paintings and the Pandora-18K art dataset, demonstrating its ability to generalize across diverse artistic styles. Our results highlight the potential of VLMs for automated art analysis.

Problem

Research questions and friction points this paper is trying to address.

Predicting Wölfflin's five principles in visual art

Evaluating color, composition, and thematic choices computationally

Fine-tuning CLIP for nuanced stylistic element prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune CLIP for Wölfflin's principles prediction

Evaluate on GAN-generated and Pandora-18K datasets

Leverage vision-language models for art analysis

🔎 Similar Papers

No similar papers found.

Authors to Follow