DORAEMON: A Unified Library for Visual Object Modeling and Representation Learning at Scale

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing approaches to multi-scale visual object modeling and representation learning lack a unified, extensible framework supporting diverse vision tasks. Method: This paper introduces an open-source PyTorch platform that unifies classification, retrieval, and metric learning into a single, YAML-driven, modular workflow. It integrates over 1,000 timm-compatible backbone architectures, plug-and-play loss functions and data augmentation strategies, and supports distributed training as well as one-click export to ONNX and Hugging Face formats. Contribution/Results: Leveraging large-scale benchmark datasets and state-of-the-art training techniques, the platform significantly improves experimental efficiency and result reproducibility. It reproduces or surpasses SOTA performance on ImageNet-1K, MS-Celeb-1M, and Stanford Online Products, demonstrating strong generalization and engineering practicality. The platform bridges the gap between vision research and production deployment, establishing a complete “research-to-deployment” pipeline for visual recognition.

Technology Category

Application Category

📝 Abstract

DORAEMON is an open-source PyTorch library that unifies visual object modeling and representation learning across diverse scales. A single YAML-driven workflow covers classification, retrieval and metric learning; more than 1000 pretrained backbones are exposed through a timm-compatible interface, together with modular losses, augmentations and distributed-training utilities. Reproducible recipes match or exceed reference results on ImageNet-1K, MS-Celeb-1M and Stanford online products, while one-command export to ONNX or HuggingFace bridges research and deployment. By consolidating datasets, models, and training techniques into one platform, DORAEMON offers a scalable foundation for rapid experimentation in visual recognition and representation learning, enabling efficient transfer of research advances to real-world applications. The repository is available at https://github.com/wuji3/DORAEMON.

Problem

Research questions and friction points this paper is trying to address.

Unifies visual object modeling and representation learning across scales

Provides scalable foundation for rapid experimentation in visual recognition

Bridges research and deployment with one-command export capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified PyTorch library for visual object modeling

YAML-driven workflow for classification and retrieval tasks

One-command export to ONNX and HuggingFace platforms

🔎 Similar Papers

A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training