🤖 AI Summary
This study addresses the limitations of existing public health policy models, which often neglect individual behavioral responses and real-world uncertainties such as surveillance errors and implementation deviations, leading to biased intervention evaluations. To overcome this, the authors propose a dynamic simulation framework that integrates rational individual decision-making with dual sources of uncertainty—measurement and execution—within a reinforcement learning–driven policy optimization system. For the first time, multidimensional non-pharmaceutical interventions, behavioral feedback, and uncertainty are unified in this architecture. A hierarchical reinforcement learning approach, combining DQN with uncertainty-aware DDPG/TD3 algorithms, enables coordinated interaction between thousands of agents and policymakers. Experimental results demonstrate that the proposed method significantly reduces epidemic peak height and duration, confirms the critical roles of mask-wearing and vaccination under uncertainty, and enhances both the effectiveness and robustness of intervention strategies.
📝 Abstract
Purpose The WHO's COVID-19 non-pharmaceutical interventions (e.g., lockdowns, vaccinations) effectively curb transmission but impose heavy economic strains. Existing research often neglects individual behaviors and falsely assumes perfect infection tracking and flawless policy execution, failing to account for real-world uncertainties and errors. Methods We propose an integrative approach incorporating uncertainties in both epidemic measurement (infections/hospitalizations) and policy implementation. We built a simulation model of 1,000 individuals making real-time choices regarding mask-wearing, vaccination, and shopping. Concurrently, policymakers deploy interventions (lockdowns, mandates) based on health and economic observations. This framework is driven by hierarchical reinforcement learning agents, utilizing deep Q-networks alongside uncertainty-aware policy gradient variants (DDPG and TD3). Results The simulations effectively managed the epidemic's progression. Masking and vaccinations proved highly effective, significantly reducing both the outbreak's peak height and duration. By integrating individual behaviors, policy uncertainties, and multifaceted interventions, our dynamic control approach successfully mitigated the epidemic's impact. Conclusions Our model overcomes previous research limitations by embedding uncertainty and human behavior into public health policy frameworks. The simulation demonstrates that accounting for individual choices and imperfect data is crucial for designing effective interventions during complex pandemics, with masks and vaccines serving as pivotal tools.