🤖 AI Summary
Optimizing charging of inhomogeneous Dicke quantum batteries under partial observability remains challenging due to limited measurement access to the full quantum state.
Method: We propose a reinforcement learning–based piecewise-constant charging protocol, embedded within a multi-level observability framework—spanning single two-level-system (TLS) energy, first-order averages, and second-order correlations—to systematically quantify how information loss impacts charging performance.
Contribution/Results: Crucially, incorporating second-order correlation measurements substantially mitigates partial-observability losses: observing only local energies and pairwise correlations recovers 94%–98% of the energy storage efficiency achievable under full-state tomography. Furthermore, we identify that non-myopic scheduling is essential for achieving rapid, low-fluctuation ergodic work extraction. Under full observability, our protocol approaches the theoretical optimum while exhibiting markedly reduced power fluctuations. This work establishes a new paradigm for practical, measurement-efficient charging of quantum batteries in resource-constrained experimental settings.
📝 Abstract
Charging optimization is a key challenge to the implementation of quantum batteries, particularly under inhomogeneity and partial observability. This paper employs reinforcement learning to optimize piecewise-constant charging policies for an inhomogeneous Dicke battery. We systematically compare policies across four observability regimes, from full-state access to experimentally accessible observables (energies of individual two-level systems (TLSs), first-order averages, and second-order correlations). Simulation results demonstrate that full observability yields near-optimal ergotropy with low variability, while under partial observability, access to only single-TLS energies or energies plus first-order averages lags behind the fully observed baseline. However, augmenting partial observations with second-order correlations recovers most of the gap, reaching 94%-98% of the full-state baseline. The learned schedules are nonmyopic, trading temporary plateaus or declines for superior terminal outcomes. These findings highlight a practical route to effective fast-charging protocols under realistic information constraints.