🤖 AI Summary
Oral diseases affect nearly 3.5 billion people worldwide, yet a systematic understanding of the potential and limitations of large models in dental clinical practice remains lacking. Addressing this gap, this study conducts a systematic review of 97 works following PRISMA-ScR guidelines and proposes a novel two-dimensional classification framework integrating “architectural paradigm” and “dental specialization level.” This framework unifies diverse approaches, including language generation models (e.g., OralGPT), vision-based discriminative models (e.g., variants of SAM and CLIP), and multimodal dentistry-specific architectures (e.g., DentVFM, DentVLM). The analysis reveals complementary strengths between general-purpose and specialized models, demonstrates that integrated multi-model pipelines substantially outperform single-model systems, and shows that specialized models achieve superior performance on complex multimodal tasks. The study further identifies hallucination, scarcity of annotated data, and the absence of clinical evaluation benchmarks as key deployment barriers, alongside a critical data asymmetry stemming from insufficient dental textual corpora.
📝 Abstract
Background: Oral diseases affect nearly 3.5 billion people worldwide, yet the comparative clinical potential of large-scale AI models in dentistry remains poorly understood. Three distinct model categories have emerged: language-generative models, discriminative vision foundation models, and dental-specific foundation models, with no unified review examining their relationships and collective limitations.
Methods: Following PRISMA-ScR guidelines, we systematically searched four databases (PubMed, Google Scholar, Scopus, arXiv), screened independently by two reviewers. After applying inclusion/exclusion criteria, 97 studies (2020-2026) were included. We propose a two-dimensional classification framework organizing models by architectural paradigm and dental specialization degree.
Results: Language-generative models excel at text-based tasks (clinical reasoning, licensing exams, patient communication) but show inconsistent performance on image-dependent diagnostics. Adapted SAM and CLIP variants achieve strong tooth segmentation and lesion detection results. Dental-specific models (DentVFM, DentVLM, OralGPT) demonstrate strongest performance on complex multimodal tasks. Integrated pipelines consistently outperform single-model approaches. A data asymmetry is observed: dental-specific pretraining concentrates almost entirely in the vision domain, reflecting scarce large-scale dental text corpora.
Conclusions: General-purpose and dental-specific models play complementary roles; the most effective systems combine both within structured pipelines. Safe autonomous deployment requires resolving three persistent barriers: hallucination in generative models, limited annotated dental datasets, and absent standardized clinical evaluation benchmarks.