EnGraf-Net: Multiple Granularity Branch Network with Fine-Coarse Graft Grained for Classification Task

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Fine-grained classification faces the dual challenge of large intra-class variation and small inter-class variation. Existing approaches often rely on part-level annotations or attention mechanisms, leading to incomplete local feature representations and neglecting semantic correlations across granularity levels. To address this, we propose EnGraf-Net—a multi-granularity branch network that requires neither part annotations nor image cropping. It is the first method to leverage fine-to-coarse semantic relationships encoded in taxonomy hierarchies as end-to-end supervisory signals. Through a novel “fine-to-coarse grafting” mechanism, EnGraf-Net fuses hierarchical semantics to construct discriminative, multi-level feature representations. This design avoids localization bias and annotation overhead while enabling robust learning. Extensive experiments demonstrate that EnGraf-Net significantly outperforms state-of-the-art unsupervised fine-grained methods on CIFAR-100, CUB-200-2011, and FGVC-Aircraft, achieving performance on par with fully supervised SOTA approaches—thus offering both effectiveness and practicality.

Technology Category

Application Category

📝 Abstract

Fine-grained classification models are designed to focus on the relevant details necessary to distinguish highly similar classes, particularly when intra-class variance is high and inter-class variance is low. Most existing models rely on part annotations such as bounding boxes, part locations, or textual attributes to enhance classification performance, while others employ sophisticated techniques to automatically extract attention maps. We posit that part-based approaches, including automatic cropping methods, suffer from an incomplete representation of local features, which are fundamental for distinguishing similar objects. While fine-grained classification aims to recognize the leaves of a hierarchical structure, humans recognize objects by also forming semantic associations. In this paper, we leverage semantic associations structured as a hierarchy (taxonomy) as supervised signals within an end-to-end deep neural network model, termed EnGraf-Net. Extensive experiments on three well-known datasets CIFAR-100, CUB-200-2011, and FGVC-Aircraft demonstrate the superiority of EnGraf-Net over many existing fine-grained models, showing competitive performance with the most recent state-of-the-art approaches, without requiring cropping techniques or manual annotations.

Problem

Research questions and friction points this paper is trying to address.

Improving fine-grained classification by addressing incomplete local feature representation

Eliminating dependency on part annotations or manual cropping techniques

Leveraging semantic hierarchy as supervised signals for object recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses hierarchical taxonomy as supervised signals

End-to-end deep neural network without manual annotations

Multiple granularity branch with fine-coarse graft grained

🔎 Similar Papers

No similar papers found.