🤖 AI Summary
Existing 3D approaches typically predict at the patient or organ level, making it challenging to characterize clinical attributes of individual lesions in renal CT scans with fine-grained precision. This work proposes LesionDETR, a novel framework that formulates lesion representation as a set prediction task with variable cardinality per kidney, enabling the first end-to-end, multi-granular prediction of four clinically relevant lesion-level attributes and supporting structured report generation. Built upon the DETR architecture, LesionDETR incorporates segmentation masks as additional input channels and leverages abdominal domain-specific pretraining (SuPreM) alongside a multi-representation encoding strategy. It further introduces a size- and distance-aware Hungarian matching scheme and a hierarchical loss function. On the UF-Health dataset, the model achieves a bilateral abnormality AUC of 0.799 and a zero-shot AUC of 0.817 on KiTS23; conditional counting mAP for cystic lesions reaches 0.190, while performance on solid lesions is limited by data scarcity.
📝 Abstract
Radiology reports describe kidney lesions by type, size, enhancement, and attenuation, yet existing 3D methods predict only at the patient or organ level. We reformulate kidney CT characterization as a per-lesion set-prediction task: one model emits a variable number of lesions per kidney, each with four clinical attributes. We curated 2,619 CT volumes from 788 patients at one academic medical center, with multi-granularity side- and per-lesion labels, and used KiTS23 (489 cases) for zero-shot external validation. We propose \textbf{LesionDETR}, a DETR-style architecture with size-distance Hungarian matching and a hierarchical loss that aggregates per-slot outputs to side-level objectives. Across four input representations and six encoder initializations, two design choices dominate: a segmentation mask as an input channel, and same-domain abdominal pretraining (SuPreM); generic large-corpus pretraining is no better than random initialization. LesionDETR reaches bilateral side-level abnormality AUC $0.799 \pm 0.009$ on UF-Health and $0.817 \pm 0.072$ on KiTS23. A count-conditioned variant reaches per-lesion mAP $0.190 \pm 0.083$ on cystic lesions; rare solid-lesion AP stays at the noise floor, pointing to targeted data collection, not architecture, as the next bottleneck. The framework yields verified per-lesion predictions for downstream structured report generation.