Item Region-based Style Classification Network (IRSN): A Fashion Style Classifier Based on Domain Knowledge of Fashion Experts

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-grained fashion style classification faces dual challenges: large intra-class variation and high inter-class visual similarity. To address these, we propose a domain-knowledge-enhanced multi-granularity modeling framework. First, we introduce Item-Region Pooling (IRP), a novel mechanism that explicitly captures local features of individual garments and their compositional relationships. Second, we design a Gated Feature Fusion (GFF) module to adaptively integrate heterogeneous features from domain-specialized networks and large-scale vision-language pre-trained backbones (e.g., ViT, ConvNeXt, EfficientNet, Swin). Our dual-backbone collaborative architecture significantly enhances discriminative capability: it achieves average accuracy gains of 6.9–7.6% on FashionStyle14 and ShowniqV3, with peak improvements reaching 15.1%. Visual analysis further confirms its superior ability to distinguish highly similar style categories.

Technology Category

Application Category

📝 Abstract
Fashion style classification is a challenging task because of the large visual variation within the same style and the existence of visually similar styles. Styles are expressed not only by the global appearance, but also by the attributes of individual items and their combinations. In this study, we propose an item region-based fashion style classification network (IRSN) to effectively classify fashion styles by analyzing item-specific features and their combinations in addition to global features. IRSN extracts features of each item region using item region pooling (IRP), analyzes them separately, and combines them using gated feature fusion (GFF). In addition, we improve the feature extractor by applying a dual-backbone architecture that combines a domain-specific feature extractor and a general feature extractor pre-trained with a large-scale image-text dataset. In experiments, applying IRSN to six widely-used backbones, including EfficientNet, ConvNeXt, and Swin Transformer, improved style classification accuracy by an average of 6.9% and a maximum of 14.5% on the FashionStyle14 dataset and by an average of 7.6% and a maximum of 15.1% on the ShowniqV3 dataset. Visualization analysis also supports that the IRSN models are better than the baseline models at capturing differences between similar style classes.
Problem

Research questions and friction points this paper is trying to address.

Classifies fashion styles using item-specific features and combinations
Improves accuracy by analyzing both global and detailed item attributes
Addresses visual similarity challenges in style classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Item region pooling extracts features from specific fashion items
Gated feature fusion combines global and item-specific features
Dual-backbone architecture merges domain-specific and general feature extractors
🔎 Similar Papers
No similar papers found.
J
Jinyoung Choi
School of CSEE, Handong Global University, 558 Handong-ro Buk-gu, Pohang, 37554, Gyeongbuk, Republic of Korea.
Y
Youngchae Kwon
School of CSEE, Handong Global University, 558 Handong-ro Buk-gu, Pohang, 37554, Gyeongbuk, Republic of Korea.
Injung Kim
Injung Kim
Professor, Handong Global University
AIdeep learningimage analysis and synthesisspeech synthesissmart factory