🤖 AI Summary
To address the lack of semantic interpretability and cross-class discriminability in hash codes for fine-grained image retrieval, this paper proposes an attribute-aware learnable query hashing method. The method introduces learnable visual attribute queries, wherein each hash bit explicitly corresponds to a semantically meaningful attribute; incorporates an auxiliary branch to model higher-order attribute interactions, thereby enhancing the discriminability and robustness of low-bit hash codes; and employs a Transformer architecture with multi-head self-attention for end-to-end optimization. Evaluated on standard benchmarks—including CUB-200 and Stanford Dogs—the method significantly outperforms state-of-the-art hashing approaches, particularly under 16–32-bit code lengths, achieving substantial gains in retrieval accuracy. Crucially, it simultaneously delivers strong semantic interpretability—each bit is grounded in human-understandable attributes—and high retrieval performance.
📝 Abstract
Fine-grained hashing has become a powerful solution for rapid and efficient image retrieval, particularly in scenarios requiring high discrimination between visually similar categories. To enable each hash bit to correspond to specific visual attributes, we propoe a novel method that harnesses learnable queries for attribute-aware hash codes learning. This method deploys a tailored set of queries to capture and represent nuanced attribute-level information within the hashing process, thereby enhancing both the interpretability and relevance of each hash bit. Building on this query-based optimization framework, we incorporate an auxiliary branch to help alleviate the challenges of complex landscape optimization often encountered with low-bit hash codes. This auxiliary branch models high-order attribute interactions, reinforcing the robustness and specificity of the generated hash codes. Experimental results on benchmark datasets demonstrate that our method generates attribute-aware hash codes and consistently outperforms state-of-the-art techniques in retrieval accuracy and robustness, especially for low-bit hash codes, underscoring its potential in fine-grained image hashing tasks.