🤖 AI Summary
Automated localization and description of abnormalities in whole-body CT remain a major challenge in clinical radiology. To address this, we propose OminiAbnorm-CT, the first model supporting hierarchical classification of 404 abnormality types across multiple anatomical planes and all body regions. We introduce a high-quality CT dataset comprising 14.5K images with 19K fine-grained, spatially grounded annotations. Methodologically, OminiAbnorm-CT integrates a multimodal large language–vision architecture, cross-plane feature alignment, and joint visual–textual grounding for abnormality representation. Crucially, it incorporates a radiologist-in-the-loop annotation and evaluation framework. Evaluated on three real-world clinical tasks—including abnormality detection, spatial localization, and natural-language description—our model significantly outperforms prior methods, achieving state-of-the-art accuracy in lesion localization and unprecedented clinical consistency in textual descriptions. This work establishes an abnormality-centric paradigm for clinical-grade CT understanding.
📝 Abstract
Automated interpretation of CT images-particularly localizing and describing abnormal findings across multi-plane and whole-body scans-remains a significant challenge in clinical radiology. This work aims to address this challenge through four key contributions: (i) On taxonomy, we collaborate with senior radiologists to propose a comprehensive hierarchical classification system, with 404 representative abnormal findings across all body regions; (ii) On data, we contribute a dataset containing over 14.5K CT images from multiple planes and all human body regions, and meticulously provide grounding annotations for over 19K abnormalities, each linked to the detailed description and cast into the taxonomy; (iii) On model development, we propose OminiAbnorm-CT, which can automatically ground and describe abnormal findings on multi-plane and whole-body CT images based on text queries, while also allowing flexible interaction through visual prompts; (iv) On benchmarks, we establish three representative evaluation tasks based on real clinical scenarios. Through extensive experiments, we show that OminiAbnorm-CT can significantly outperform existing methods on all the tasks and metrics.