Comparison of Different Deep Neural Network Models in the Cultural Heritage Domain

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study systematically evaluates the cross-domain transferability (ImageNet → cultural heritage) of six mainstream architectures—DenseNet, ViT, Swin Transformer, PoolFormer, and others—for cultural heritage image analysis. Adopting a unified pretraining-fine-tuning framework with standardized data augmentation and cross-dataset evaluation protocols, it establishes, for the first time, a comparable benchmark in this domain. To holistically assess practical deployment viability, the work introduces the “efficiency–computation ratio” as a core metric, jointly quantifying accuracy, GPU memory footprint, and inference latency. Experimental results demonstrate that DenseNet achieves the best overall performance: it maintains high classification accuracy while reducing GPU memory consumption by 42% and accelerating inference by 1.8× on average compared to ViT-based models. These findings provide principled guidance for model selection and establish a lightweight, transferable paradigm for intelligent cultural heritage analysis.

Technology Category

Application Category

📝 Abstract

The integration of computer vision and deep learning is an essential part of documenting and preserving cultural heritage, as well as improving visitor experiences. In recent years, two deep learning paradigms have been established in the field of computer vision: convolutional neural networks and transformer architectures. The present study aims to make a comparative analysis of some representatives of these two techniques of their ability to transfer knowledge from generic dataset, such as ImageNet, to cultural heritage specific tasks. The results of testing examples of the architectures VGG, ResNet, DenseNet, Visual Transformer, Swin Transformer, and PoolFormer, showed that DenseNet is the best in terms of efficiency-computability ratio.

Problem

Research questions and friction points this paper is trying to address.

Compare CNN and transformer models for cultural heritage tasks

Evaluate knowledge transfer from ImageNet to heritage applications

Identify most efficient model (DenseNet) for heritage preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares CNN and transformer models for heritage tasks

Tests VGG, ResNet, DenseNet, ViT, Swin, PoolFormer

Finds DenseNet best for efficiency-computability balance

🔎 Similar Papers

Have Large Vision-Language Models Mastered Art History?