🤖 AI Summary
This study addresses safety and fairness challenges in offline reinforcement learning for Medicaid care management. We propose a grouped safety-threshold adaptation method that jointly performs group-aware safety calibration and fairness optimization—targeting either coverage or harm parity across protected subgroups—while preserving policy value via a feasibility-guided mechanism. The method operates on de-identified longitudinal healthcare trajectory data, integrating behavior cloning, HACO baseline comparison, bootstrapped 95% confidence interval estimation, and subgroup difference significance testing. Compared to a global safety-constrained baseline, our approach maintains comparable policy value while significantly improving fairness metrics (p < 0.01), demonstrating the practical feasibility of jointly ensuring safety and subgroup fairness in real-world Medicaid programs. The core contribution lies in decoupling rigid global safety constraints into subgroup-sensitive, dynamically adjusted thresholds, with fairness optimization provably grounded in feasibility guarantees.
📝 Abstract
We introduce Feasibility-Guided Fair Adaptive Reinforcement Learning (FG-FARL), an offline RL procedure that calibrates per-group safety thresholds to reduce harm while equalizing a chosen fairness target (coverage or harm) across protected subgroups. Using de-identified longitudinal trajectories from a Medicaid population health management program, we evaluate FG-FARL against behavior cloning (BC) and HACO (Hybrid Adaptive Conformal Offline RL; a global conformal safety baseline). We report off-policy value estimates with bootstrap 95% confidence intervals and subgroup disparity analyses with p-values. FG-FARL achieves comparable value to baselines while improving fairness metrics, demonstrating a practical path to safer and more equitable decision support.