🤖 AI Summary
This paper addresses the joint optimization of communication and control in discrete-time stochastic linear systems, focusing on coordinated decision-making between a scheduler (determining when to communicate) and a controller (designing inputs based on intermittent observations) under a partially nested information structure. We rigorously prove, for the first time, that the optimal controller in this setting admits a deterministic-equivalent form. Leveraging this insight, we propose InterQ—the first deep Q-network (DQN)-based learning framework for intermittent scheduling—where scheduling is formulated as a partially observable Markov decision process (POMDP). Experiments demonstrate that InterQ significantly outperforms periodic and event-triggered baselines in balancing control performance and communication cost, while exhibiting strong generalization across system parameters and noise statistics. The implementation is publicly available.
📝 Abstract
In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and control performance. The controller, in turn, determines the control input based on the intermittently received information. Given the partially nested information structure, we show that the optimal control policy follows a certainty-equivalence form. Subsequently, we analyze the qualitative behavior of the scheduling policy. To develop the optimal scheduling policy, we propose InterQ, a deep reinforcement learning algorithm which uses a deep neural network to approximate the Q-function. Through extensive numerical evaluations, we analyze the scheduling landscape and further compare our approach against two baseline strategies: (a) a multi-period periodic scheduling policy, and (b) an event-triggered policy. The results demonstrate that our proposed method outperforms both baselines. The open source implementation can be found at https://github.com/AC-sh/InterQ.