🤖 AI Summary
This work addresses the limitations of large language models (LLMs) in radiotherapy planning—specifically their lack of physical intuition, difficulty in convergence, and poor generalization—by introducing a machine-to-machine knowledge-guided framework that uniquely integrates physics-aware deep reinforcement learning with LLM-based reasoning. The approach leverages in-context learning to inject treatment parameter distributions discovered by reinforcement learning into the LLM, thereby restoring the causal relationship between parameter adjustments and dosimetric outcomes and enabling autonomous, human-free iterative optimization. Evaluated across diverse clinical scenarios including prostate and liver radiotherapy, the method significantly reduces the number of optimization iterations, consistently generates high-quality plans, and learns a hierarchical decision policy that prioritizes planning objectives. It demonstrates strong generalization across tumor sites, anatomical structures, and varying initial plan qualities.
📝 Abstract
In this work, we propose a prototype machine-to-machine (M2M) knowledge-guided Large Language Model (LLM) framework for automated radiotherapy treatment planning. In the proposed paradigm, Treatment Planning Parameter (TPP) distribution knowledge discovered by a Deep Reinforcement Learning (DRL) agent is transferred to an LLM agent through in-context learning, enabling autonomous iterative planning without human intervention. While standard LLM-based planning often lacks physical intuition and struggles with convergence, the integration of DRL-derived guidance constrains the agent to a physically valid parameter space. Experimental evaluations are performed across three diverse planning scenarios: basic prostate cases, complex prostate configurations with increased organ-at-risk (OAR) constraints, and liver cases. The evaluation results demonstrate that the guided LLM agent consistently achieves optimal planning scores while significantly reducing the number of iterations compared to unguided planning. Analysis of the final TPP configurations reveals that the agent successfully learns a hierarchical priority of objectives, effectively restoring a logical "cause-and-effect" relationship between parameter tuning and dosimetric outcomes. Crucially, this prototype framework exhibits robust generalizability, maintaining high planning quality regardless of specific patient anatomy, treatment site, or initial plan quality. By bridging the specialized optimization of DRL with the adaptive reasoning of LLMs, this M2M framework establishes a scalable foundation towards generalizable autonomous treatment planning, ultimately benefiting clinical practice in realistic environments.