🤖 AI Summary
Existing publicly available chest X-ray datasets lack critical combinations of clinical features, limiting the generalization capability of AI models in high-stakes clinical settings. To address this, this work proposes the CARS framework, which for the first time integrates anatomical structure preservation with precise manipulation of clinical attributes. By applying targeted perturbations to latent feature vectors while preserving anatomical fidelity, CARS enables controllable generation of synthetic images that either include or exclude specific pathologies. The method combines anatomical constraints, multi-backbone fine-tuning, and uncertainty calibration, achieving significant improvements in precision-recall performance, reduced prediction uncertainty, and better calibration on MIMIC-CXR. Radiologist evaluations confirm that the generated images exhibit high clinical realism and consistency.
📝 Abstract
The clinical deployment of AI diagnostic models demands more than benchmark accuracy - it demands robustness across the full spectrum of disease presentations. However, publicly available chest radiographic datasets systematically underrepresent critical clinical feature combinations, leaving models under-trained precisely where clinical stakes are highest. We present CARS, a clinically aware and anatomically grounded framework that addresses this gap through principled synthetic image generation. CARS applies targeted perturbations to clinical feature vectors, enabling controlled insertion and deletion of pathological findings while explicitly preserving anatomical structure. We evaluate CARS across seven backbone architectures by fine-tuning models on synthetic subsets and testing on a held-out MIMIC-CXR benchmark. Compared to prior feature perturbation approaches, fine-tuning on CARS-generated images consistently improves precision-recall performance, reduces predictive uncertainty, and improves model calibration. Structural and semantic analyses demonstrate high anatomical fidelity, strong feature alignment, and low semantic uncertainty. Independent evaluation by two expert radiologists further confirms realism and clinical agreement. As the field moves toward regulated clinical AI, CARS demonstrates that anatomically faithful synthetic data generation for better feature space coverage is a viable and effective strategy for improving both the performance and trustworthiness of chest X-ray classification systems - without compromising clinical integrity.