🤖 AI Summary
To address the challenge of simultaneously achieving covert communication and intelligent anti-jamming under mobile, responsive jamming scenarios, this paper proposes a parallel deep reinforcement learning (DRL) framework. The method decouples the joint action space to jointly optimize covertness—via low-power spread-spectrum transmission to evade high-power tracking—and robustness—through dynamic spectrum adaptation to counter indiscriminate frequency sweeping. A novel parallel exploration-exploitation selection mechanism replaces the conventional ε-greedy policy, significantly accelerating convergence. Simulation results demonstrate that, under complex time-varying jamming, the proposed approach achieves nearly a 90% improvement in normalized throughput while concurrently enhancing both communication robustness and covertness. This work constitutes the first application of parallel DRL to mobile anti-jamming communications, establishing a new paradigm for intelligent covert communication in highly dynamic wireless environments.
📝 Abstract
This paper addresses the challenge of anti-jamming in moving reactive jamming scenarios. The moving reactive jammer initiates high-power tracking jamming upon detecting any transmission activity, and when unable to detect a signal, resorts to indiscriminate jamming. This presents dual imperatives: maintaining hiding to avoid the jammer's detection and simultaneously evading indiscriminate jamming. Spread spectrum techniques effectively reduce transmitting power to elude detection but fall short in countering indiscriminate jamming. Conversely, changing communication frequencies can help evade indiscriminate jamming but makes the transmission vulnerable to tracking jamming without spread spectrum techniques to remain hidden. Current methodologies struggle with the complexity of simultaneously optimizing these two requirements due to the expansive joint action spaces and the dynamics of moving reactive jammers. To address these challenges, we propose a parallelized deep reinforcement learning (DRL) strategy. The approach includes a parallelized network architecture designed to decompose the action space. A parallel exploration-exploitation selection mechanism replaces the $varepsilon $-greedy mechanism, accelerating convergence. Simulations demonstrate a nearly 90% increase in normalized throughput.