🤖 AI Summary
Flexible object recognition faces challenges including shape deformability, translucency, and subtle inter-class distinctions. Existing graph-based models struggle to jointly capture local semantics and global visual relationships, while suffering from insufficient semantic–visual alignment. To address these issues, we propose a semantic-enhanced heterogeneous graph learning framework: (i) an adaptive scanning module dynamically aligns visual and semantic nodes; (ii) a semantic–visual dual-stream feature aggregation mechanism fuses local semantic cues with global appearance information; and (iii) we introduce FSCW—the first large-scale flexible object dataset. Our method achieves significant improvements over state-of-the-art (SOTA) methods on FDA and FSCW, and attains SOTA performance on CIFAR-100 and ImageNet-Hard, demonstrating the effectiveness of heterogeneous graph modeling and cross-modal alignment.
📝 Abstract
Flexible objects recognition remains a significant challenge due to its inherently diverse shapes and sizes, translucent attributes, and subtle inter-class differences. Graph-based models, such as graph convolution networks and graph vision models, are promising in flexible objects recognition due to their ability of capturing variable relations within the flexible objects. These methods, however, often focus on global visual relationships or fail to align semantic and visual information. To alleviate these limitations, we propose a semantic-enhanced heterogeneous graph learning method. First, an adaptive scanning module is employed to extract discriminative semantic context, facilitating the matching of flexible objects with varying shapes and sizes while aligning semantic and visual nodes to enhance cross-modal feature correlation. Second, a heterogeneous graph generation module aggregates global visual and local semantic node features, improving the recognition of flexible objects. Additionally, We introduce the FSCW, a large-scale flexible dataset curated from existing sources. We validate our method through extensive experiments on flexible datasets (FDA and FSCW), and challenge benchmarks (CIFAR-100 and ImageNet-Hard), demonstrating competitive performance.