🤖 AI Summary
This work addresses the three-dimensional trajectory planning problem in unmanned aerial vehicle (UAV)-assisted visible light communication systems by proposing an efficient adaptive path optimization method. The approach derives a closed-form analytical solution for the optimal flight altitude that satisfies a prescribed channel gain threshold and incorporates a pheromone-inspired reward mechanism to guide horizontal trajectory optimization via the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. By effectively integrating mixed-integer non-convex optimization with deep reinforcement learning, the method significantly enhances planning performance while maintaining high data collection efficiency. Experimental results demonstrate that, compared to baseline approaches, the proposed strategy reduces flight distance by up to 35% and shortens convergence steps by approximately 50%.
📝 Abstract
Recently, the integration of unmanned aerial vehicle (UAV) and visible light communication (VLC) technologies has emerged as a promising solution to offer flexible communication and efficient lighting. This letter investigates the three-dimensional trajectory planning in a UAV-assisted VLC system, where a UAV is dispatched to collect data from ground users (GUs). The core objective is to develop a trajectory planning framework that minimizes UAV flight distance, which is equivalent to maximizing the data collection efficiency. This issue is formulated as a challenging mixed-integer non-convex optimization problem. To tackle it, we first derive a closed-form optimal flight altitude under specific VLC channel gain threshold. Subsequently, we optimize the UAV horizontal trajectory by integrating a novel pheromone-driven reward mechanism with the twin delayed deep deterministic policy gradient algorithm, which enables adaptive UAV motion strategy in complex environments. Simulation results validate that the derived optimal altitude effectively reduces the flight distance by up to 35% compared to baseline methods. Additionally, the proposed reward mechanism significantly shortens the convergence steps by approximately 50%, demonstrating notable efficiency gains in the context of UAV-assisted VLC data collection.