🤖 AI Summary
This work addresses the longstanding reliance of anatomical landmark detection in medical imaging on domain-specific models, which impedes leveraging the representational power of large-scale vision foundation models. We propose MedSapiens—the first adaptation of the human pose estimation foundation model Sapiens to medical landmark detection. Our approach employs cross-dataset, multi-stage pretraining followed by few-shot fine-tuning to effectively transfer spatial-pose priors from natural imagery to the medical domain. We systematically demonstrate the feasibility and superiority of general-purpose pose models for this task, establishing a new strong baseline. MedSapiens achieves state-of-the-art performance across multiple medical imaging benchmarks, with an average success detection rate (SDR) improvement of 5.26% over generic vision models and 21.81% over the best prior domain-specific methods. Notably, it maintains significant gains—+2.69% SDR—even under few-shot settings, outperforming existing approaches.
📝 Abstract
This paper does not introduce a novel architecture; instead, it revisits a fundamental yet overlooked baseline: adapting human-centric foundation models for anatomical landmark detection in medical imaging. While landmark detection has traditionally relied on domain-specific models, the emergence of large-scale pre-trained vision models presents new opportunities. In this study, we investigate the adaptation of Sapiens, a human-centric foundation model designed for pose estimation, to medical imaging through multi-dataset pretraining, establishing a new state of the art across multiple datasets. Our proposed model, MedSapiens, demonstrates that human-centric foundation models, inherently optimized for spatial pose localization, provide strong priors for anatomical landmark detection, yet this potential has remained largely untapped. We benchmark MedSapiens against existing state-of-the-art models, achieving up to 5.26% improvement over generalist models and up to 21.81% improvement over specialist models in the average success detection rate (SDR). To further assess MedSapiens adaptability to novel downstream tasks with few annotations, we evaluate its performance in limited-data settings, achieving 2.69% improvement over the few-shot state of the art in SDR. Code and model weights are available at https://github.com/xmed-lab/MedSapiens .