A Semantic-Enhanced Heterogeneous Graph Learning Method for Flexible Objects Recognition

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Flexible object recognition faces challenges including shape deformability, translucency, and subtle inter-class distinctions. Existing graph-based models struggle to jointly capture local semantics and global visual relationships, while suffering from insufficient semantic–visual alignment. To address these issues, we propose a semantic-enhanced heterogeneous graph learning framework: (i) an adaptive scanning module dynamically aligns visual and semantic nodes; (ii) a semantic–visual dual-stream feature aggregation mechanism fuses local semantic cues with global appearance information; and (iii) we introduce FSCW—the first large-scale flexible object dataset. Our method achieves significant improvements over state-of-the-art (SOTA) methods on FDA and FSCW, and attains SOTA performance on CIFAR-100 and ImageNet-Hard, demonstrating the effectiveness of heterogeneous graph modeling and cross-modal alignment.

Technology Category

Application Category

📝 Abstract
Flexible objects recognition remains a significant challenge due to its inherently diverse shapes and sizes, translucent attributes, and subtle inter-class differences. Graph-based models, such as graph convolution networks and graph vision models, are promising in flexible objects recognition due to their ability of capturing variable relations within the flexible objects. These methods, however, often focus on global visual relationships or fail to align semantic and visual information. To alleviate these limitations, we propose a semantic-enhanced heterogeneous graph learning method. First, an adaptive scanning module is employed to extract discriminative semantic context, facilitating the matching of flexible objects with varying shapes and sizes while aligning semantic and visual nodes to enhance cross-modal feature correlation. Second, a heterogeneous graph generation module aggregates global visual and local semantic node features, improving the recognition of flexible objects. Additionally, We introduce the FSCW, a large-scale flexible dataset curated from existing sources. We validate our method through extensive experiments on flexible datasets (FDA and FSCW), and challenge benchmarks (CIFAR-100 and ImageNet-Hard), demonstrating competitive performance.
Problem

Research questions and friction points this paper is trying to address.

Recognizing flexible objects with diverse shapes and sizes
Aligning semantic and visual information for better recognition
Improving recognition of translucent objects with subtle differences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive scanning module extracts semantic context
Heterogeneous graph aggregates visual and semantic features
Large-scale flexible dataset FSCW introduced for validation
🔎 Similar Papers
No similar papers found.
K
Kunshan Yang
University of Electronic Science and Technology of China
W
Wenwei Luo
University of Electronic Science and Technology of China
Y
Yuguo Hu
University of Electronic Science and Technology of China
J
Jiafu Yan
University of Electronic Science and Technology of China
Mengmeng Jing
Mengmeng Jing
University of Electronic Science and Technology of China
Machine LearningComputer VisionMultimedia
L
Lin Zuo
University of Electronic Science and Technology of China