Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses scalable health management for Medicaid and safety-net populations, tackling the joint optimization of time cost, clinical risk, and auditability across multimodal interventions (SMS, phone calls, video visits, in-person encounters). We propose TTL+ITD, a lightweight offline reinforcement learning framework that innovatively integrates test-time learning with local neighborhood calibration and small-scale Q-ensemble inference—explicitly modeling both prediction uncertainty and temporal cost. A tunable transparency parameter enables subgroup-level impact auditing and explicit efficiency–fairness trade-offs. The method ensures policy interpretability and training traceability. Evaluated on real-world de-identified operational data, TTL+ITD demonstrates robust value estimation and fine-grained subgroup effect assessment, supporting accountable, equitable, and clinically actionable decision-making.

Technology Category

Application Category

📝 Abstract
Care coordination and population health management programs serve large Medicaid and safety-net populations and must be auditable, efficient, and adaptable. While clinical risk for outreach modalities is typically low, time and opportunity costs differ substantially across text, phone, video, and in-person visits. We propose a lightweight offline reinforcement learning (RL) approach that augments trained policies with (i) test-time learning via local neighborhood calibration, and (ii) inference-time deliberation via a small Q-ensemble that incorporates predictive uncertainty and time/effort cost. The method exposes transparent dials for neighborhood size and uncertainty/cost penalties and preserves an auditable training pipeline. Evaluated on a de-identified operational dataset, TTL+ITD achieves stable value estimates with predictable efficiency trade-offs and subgroup auditing.
Problem

Research questions and friction points this paper is trying to address.

Optimizing care coordination efficiency across different outreach modalities
Developing auditable reinforcement learning for population health management
Balancing clinical risk with time and opportunity costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time learning via local neighborhood calibration
Inference-time deliberation using small Q-ensemble
Incorporates predictive uncertainty and time cost
🔎 Similar Papers
No similar papers found.
S
Sanjay Basu
Waymark, San Francisco, CA, USA
S
Sadiq Y. Patel
Waymark, San Francisco, CA, USA
Parth Sheth
Parth Sheth
University of Pennsylvania
Machine learningData Science
B
Bhairavi Muralidharan
Waymark, San Francisco, CA, USA
N
Namrata Elamaran
Waymark, San Francisco, CA, USA
A
Aakriti Kinra
Waymark, San Francisco, CA, USA
R
Rajaie Batniji
Waymark, San Francisco, CA, USA