medR: Reward Engineering for Clinical Offline Reinforcement Learning via Tri-Drive Potential Functions

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in clinical offline reinforcement learning where reward functions are typically handcrafted using heuristic rules, limiting their generalizability across diverse disease contexts. To overcome this limitation, the study introduces an automated reward engineering framework that leverages large language models for the first time in this domain. The framework constructs a potential function based on three clinically meaningful dimensions—survivability, confidence, and capability—to generate reward signals. Prior to deployment, candidate reward structures are quantitatively evaluated and optimized through a principled selection process. Empirical results demonstrate that this approach enables disease-specific yet generalizable reward design, significantly improving policy performance across multiple clinical scenarios and validating the effectiveness and broad applicability of automated reward generation and evaluation.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning (RL) offers a powerful framework for optimizing dynamic treatment regimes (DTRs). However, clinical RL is fundamentally bottlenecked by reward engineering: the challenge of defining signals that safely and effectively guide policy learning in complex, sparse offline environments. Existing approaches often rely on manual heuristics that fail to generalize across diverse pathologies. To address this, we propose an automated pipeline leveraging Large Language Models (LLMs) for offline reward design and verification. We formulate the reward function using potential functions consisted of three core components: survival, confidence, and competence. We further introduce quantitative metrics to rigorously evaluate and select the optimal reward structure prior to deployment. By integrating LLM-driven domain knowledge, our framework automates the design of reward functions for specific diseases while significantly enhancing the performance of the resulting policies.
Problem

Research questions and friction points this paper is trying to address.

reward engineering
clinical reinforcement learning
offline reinforcement learning
dynamic treatment regimes
reward function
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward engineering
offline reinforcement learning
large language models
dynamic treatment regimes
potential functions
🔎 Similar Papers
No similar papers found.
Q
Qianyi Xu
National University of Singapore, Singapore
G
Gousia Habib
Finish Centre of Artificial Intelligence, University of Helsinki, Finland
Feng Wu
Feng Wu
National University of Singapore
Mechine LearningMedical Time Series
Yanrui Du
Yanrui Du
Harbin Institute of Technology
LLMsSafetyMedical Domain
Z
Zhihui Chen
National University of Singapore, Singapore
Swapnil Mishra
Swapnil Mishra
Assistant Professor at National University of Singapore
Bayesian InferencePoint ProcessesInfectious Diseases EpidemiologyComputational Social ScienceCOVID-19
D
Dilruk Perera
National University of Singapore, Singapore
M
Mengling Feng
National University of Singapore, Singapore