🤖 AI Summary
Existing approaches to multi-scale visual object modeling and representation learning lack a unified, extensible framework supporting diverse vision tasks. Method: This paper introduces an open-source PyTorch platform that unifies classification, retrieval, and metric learning into a single, YAML-driven, modular workflow. It integrates over 1,000 timm-compatible backbone architectures, plug-and-play loss functions and data augmentation strategies, and supports distributed training as well as one-click export to ONNX and Hugging Face formats. Contribution/Results: Leveraging large-scale benchmark datasets and state-of-the-art training techniques, the platform significantly improves experimental efficiency and result reproducibility. It reproduces or surpasses SOTA performance on ImageNet-1K, MS-Celeb-1M, and Stanford Online Products, demonstrating strong generalization and engineering practicality. The platform bridges the gap between vision research and production deployment, establishing a complete “research-to-deployment” pipeline for visual recognition.
📝 Abstract
DORAEMON is an open-source PyTorch library that unifies visual object modeling and representation learning across diverse scales. A single YAML-driven workflow covers classification, retrieval and metric learning; more than 1000 pretrained backbones are exposed through a timm-compatible interface, together with modular losses, augmentations and distributed-training utilities. Reproducible recipes match or exceed reference results on ImageNet-1K, MS-Celeb-1M and Stanford online products, while one-command export to ONNX or HuggingFace bridges research and deployment. By consolidating datasets, models, and training techniques into one platform, DORAEMON offers a scalable foundation for rapid experimentation in visual recognition and representation learning, enabling efficient transfer of research advances to real-world applications. The repository is available at https://github.com/wuji3/DORAEMON.