🤖 AI Summary
In fine-grained recognition, Euclidean distance fails to capture the nonlinear manifold structure of deep features, leading to inaccurate semantic discrepancy modeling. To address this, we propose a manifold-aware prototype matching framework. Our core innovation is the first integration of diffusion maps with differentiable Nyström interpolation to construct a dynamically aligned manifold prototype space; compact, periodically updated landmark sets enable efficient geometric alignment and scalable prototype learning. The method is seamlessly embedded into deep feature extractors and supports end-to-end differentiable optimization. On CUB-200-2011 and Stanford Cars, it significantly outperforms Euclidean prototype baselines. Learned prototypes precisely localize semantically consistent regions, achieving both improved classification accuracy and strong interpretability.
📝 Abstract
Nonlinear manifolds are widespread in deep visual features, where Euclidean distances often fail to capture true similarity. This limitation becomes particularly severe in prototype-based interpretable fine-grained recognition, where subtle semantic distinctions are essential. To address this challenge, we propose a novel paradigm for prototype-based recognition that anchors similarity within the intrinsic geometry of deep features. Specifically, we distill the latent manifold structure of each class into a diffusion space and introduce a differentiable Nyström interpolation, making the geometry accessible to both unseen samples and learnable prototypes. To ensure efficiency, we employ compact per-class landmark sets with periodic updates. This design keeps the embedding aligned with the evolving backbone, enabling fast and scalable inference. Extensive experiments on the CUB-200-2011 and Stanford Cars datasets show that our GeoProto framework produces prototypes focusing on semantically aligned parts, significantly outperforming Euclidean prototype networks.