🤖 AI Summary
This work addresses the disconnect between conventional voxel-wise regression losses in deep learning–based 3D dose prediction and clinically relevant evaluation criteria based on dose–volume histograms (DVHs). To bridge this gap, the authors propose a Clinical DVH Metric (CDM) loss that, for the first time, directly incorporates differentiable D-metrics and surrogate V-metrics into the loss function, enabling end-to-end optimization of clinically meaningful DVH objectives. Additionally, a lossless positional mask-based ROI encoding strategy is introduced to substantially improve training efficiency. Evaluated on a head-and-neck cancer dataset, the proposed method reduces the PTV score from 1.544 to 0.491, fully satisfies clinical constraints, significantly enhances target coverage, decreases training time by 83%, and markedly reduces GPU memory consumption.
📝 Abstract
Purpose: Deep-learning-based three-dimensional (3D) dose prediction is widely used in automated radiotherapy workflows. However, most existing models are trained with voxel-wise regression losses, which are poorly aligned with clinical plan evaluation criteria based on dose-volume histogram (DVH) metrics. This study aims to develop a clinically guided loss formulation that directly optimizes clinically used DVH metrics while remaining computationally efficient for head and neck (H\&N) dose prediction.
Methods: We propose a clinical DVH metric loss (CDM loss) that incorporates differentiable \textit{D-metrics} and surrogate \textit{V-metrics}, together with a lossless bit-mask region-of-interest (ROI) encoding to improve training efficiency. The method was evaluated on 174 H\&N patients using a temporal split (137 training, 37 testing).
Results: Compared with MAE- and DVH-curve based losses, CDM loss substantially improved target coverage and satisfied all clinical constraints. Using a standard 3D U-Net, the PTV Score was reduced from 1.544 (MAE) to 0.491 (MAE + CDM), while OAR sparing remained comparable. Bit-mask encoding reduced training time by 83\% and lowered GPU memory usage.
Conclusion: Directly optimizing clinically used DVH metrics enables 3D dose predictions that are better aligned with clinical treatment planning criteria than conventional voxel-wise or DVH-curve-based supervision. The proposed CDM loss, combined with efficient ROI bit-mask encoding, provides a practical and scalable framework for H\&N dose prediction.