🤖 AI Summary
Photoacoustic imaging (PAI) lacks dedicated image quality assessment (IQA) methods and annotated benchmark datasets. Method: We introduce the first multi-attribute expert-scored PAI dataset—comprising 1,134 images—spanning five clinically relevant quality dimensions, supporting both full-reference and no-reference IQA development. It is the first standardized, multi-physical-field medical imaging resource featuring authentic reference images and corresponding expert subjective scores. High-fidelity test samples were generated and rigorously scored by domain experts to enable systematic evaluation of objective IQA metrics. Results: HaarPSI$_{med}$ achieves a Spearman correlation of 0.83 with expert ratings—significantly outperforming SSIM (0.62). The dataset is publicly released, filling critical gaps in PAI-specific and cross-modal medical IQA, thereby advancing standardization and methodological progress in medical image quality evaluation.
📝 Abstract
Image quality assessment (IQA) is crucial in the evaluation stage of novel algorithms operating on images, including traditional and machine learning based methods. Due to the lack of available quality-rated medical images, most commonly used IQA methods employing reference images (i.e. full-reference IQA) have been developed and tested for natural images. Reported application inconsistencies arising when employing such measures for medical images are not surprising, as they rely on different properties than natural images. In photoacoustic imaging (PAI), especially, standard benchmarking approaches for assessing the quality of image reconstructions are lacking. PAI is a multi-physics imaging modality, in which two inverse problems have to be solved, which makes the application of IQA measures uniquely challenging due to both, acoustic and optical, artifacts.
To support the development and testing of full- and no-reference IQA measures we assembled PhotIQA, a data set consisting of 1134 reconstructed photoacoustic (PA) images that were rated by 2 experts across five quality properties (overall quality, edge visibility, homogeneity, inclusion and background intensity), where the detailed rating enables usage beyond PAI. To allow full-reference assessment, highly characterised imaging test objects were used, providing a ground truth. Our baseline experiments show that HaarPSI$_{med}$ significantly outperforms SSIM in correlating with the quality ratings (SRCC: 0.83 vs. 0.62). The dataset is publicly available at https://doi.org/10.5281/zenodo.13325196.