π€ AI Summary
This study addresses the clinical challenge of assessing the risk of inferior alveolar nerve injury due to overlap between mandibular third molars and the mandibular canal, aiming to minimize unnecessary CBCT scans while ensuring accurate evaluation. Leveraging panoramic radiographs, it presents the first systematic comparison of local, centralized, and federated learning paradigms for a binary classification task in dental imaging. Using a pretrained ResNet-34 model enhanced with Grad-CAM visualization, client-side threshold optimization, and server-side aggregation monitoring, the work comprehensively evaluates generalization performance and training dynamics across approaches. Results demonstrate that centralized learning achieves the highest performance (AUC 0.831, accuracy 0.782), followed by federated learning (AUC 0.757, accuracy 0.703), which significantly outperforms local learning (mean AUC 0.672), thereby confirming the clinical feasibility of federated learning under strict data privacy constraints.
π Abstract
Impaction of the mandibular third molar in proximity to the mandibular canal increases the risk of inferior alveolar nerve injury. Panoramic radiography is routinely used to assess this relationship. Automated classification of molar-canal overlap could support clinical triage and reduce unnecessary CBCT referrals, while federated learning (FL) enables multi-center collaboration without sharing patient data. We compared Local Learning (LL), FL, and Centralized Learning (CL) for binary overlap/no-overlap classification on cropped panoramic radiographs partitioned across eight independent labelers. A pretrained ResNet-34 was trained under each paradigm and evaluated using per-client metrics with locally optimized thresholds and pooled test performance with a global threshold. Performance was assessed using area under the receiver operating characteristic curve (AUC) and threshold-based metrics, alongside training dynamics, Grad-CAM visualizations, and server-side aggregate monitoring signals. On the test set, CL achieved the highest performance (AUC 0.831; accuracy = 0.782), FL showed intermediate performance (AUC 0.757; accuracy = 0.703), and LL generalized poorly across clients (AUC range = 0.619-0.734; mean = 0.672). Training curves suggested overfitting, particularly in LL models, and Grad-CAM indicated more anatomically focused attention in CL and FL. Overall, centralized training provided the strongest performance, while FL offers a privacy-preserving alternative that outperforms LL.