🤖 AI Summary
This work addresses the fine-grained food image retrieval task in the ICCV LargeFineFoodAI challenge. Methodologically, we propose an end-to-end framework integrating multi-model feature fusion and a novel re-ranking strategy: (1) four backbone networks are jointly trained using weighted ArcFace and Circle Loss; (2) test-time augmentation (TTA) and model ensembling enhance feature robustness; (3) a diffusion-model-guided structural re-ranking is introduced, combining diffusion priors with a k-mutual nearest neighbor graph to strengthen inter-class discriminability and local relevance modeling. Our method achieves mAP@100 scores of 0.81219 (public leaderboard) and 0.81191 (private leaderboard), ranking third overall. The key contribution lies in the first application of diffusion-model-informed structured re-ranking to fine-grained food retrieval—effectively mitigating the challenges of large intra-class variation and high inter-class similarity.
📝 Abstract
This paper introduces the 3rd place solution to the ICCV LargeFineFoodAI Retrieval Competition on Kaggle. Four basic models are independently trained with the weighted sum of ArcFace and Circle loss, then TTA and Ensemble are successively applied to improve feature representation ability. In addition, a new reranking method for retrieval is proposed based on diffusion and k-reciprocal reranking. Finally, our method scored 0.81219 and 0.81191 mAP@100 on the public and private leaderboard, respectively.