🤖 AI Summary
This work addresses the challenge of energy-efficient sleep control in mobile networks under long-term, time-averaged quality-of-service (QoS) constraints, which renders the optimization problem non-Markovian. To tackle this, the study introduces Reward Machines into this domain for the first time, leveraging abstract states to explicitly track historical QoS violations. This enables a reinforcement learning agent to balance immediate energy savings with long-term QoS guarantees despite the non-Markovian environment. The proposed approach effectively models temporal average QoS constraints and facilitates efficient sleep scheduling across diverse traffic loads. It significantly reduces energy consumption while satisfying stringent requirements—such as packet loss rates for delay-sensitive services and throughput targets for constant-bit-rate users—thereby offering a scalable new paradigm for network resource management.
📝 Abstract
Energy efficiency in mobile networks is crucial for sustainable telecommunications infrastructure, particularly as network densification continues to increase power consumption. Sleep mechanisms for the components in mobile networks can reduce energy use, but deciding which components to put to sleep, when, and for how long while preserving quality of service (QoS) remains a difficult optimisation problem. In this paper, we utilise reinforcement learning with reward machines (RMs) to make sleep-control decisions that balance immediate energy savings and long-term QoS impact, i.e. time-averaged packet drop rates for deadline-constrained traffic and time-averaged minimum-throughput guarantees for constant-rate users. A challenge is that time-averaged constraints depend on cumulative performance over time rather than immediate performance. As a result, the effective reward is non-Markovian, and optimal actions depend on operational history rather than the instantaneous system state. RMs account for the history dependence by maintaining an abstract state that explicitly tracks the QoS constraint violations over time. Our framework provides a principled, scalable approach to energy management for next-generation mobile networks under diverse traffic patterns and QoS requirements.