Resilient UAV Trajectory Planning via Few-Shot Meta-Offline Reinforcement Learning

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Joint optimization of unmanned aerial vehicle (UAV) trajectory and task scheduling in low-power IoT faces challenges for online reinforcement learning (RL) due to safety risks, high operational costs, and poor generalization—especially under data scarcity and abrupt environmental changes. Method: We propose a few-shot meta-offline RL framework that uniquely integrates conservative offline RL (CQL) with model-agnostic meta-learning (MAML), enabling rapid cross-environment adaptation without any online interaction. The problem is formulated using Age-of-Information (AoI) as the performance metric. Contribution/Results: Our method simultaneously optimizes AoI and transmission power using only minimal offline datasets. It converges faster than DQN and standard CQL, achieves superior robustness against network failures and dynamic environmental shifts, and demonstrates strong generalization across heterogeneous deployment scenarios—despite severe data constraints.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has been a promising essence in future 5G-beyond and 6G systems. Its main advantage lies in its robust model-free decision-making in complex and large-dimension wireless environments. However, most existing RL frameworks rely on online interaction with the environment, which might not be feasible due to safety and cost concerns. Another problem with online RL is the lack of scalability of the designed algorithm with dynamic or new environments. This work proposes a novel, resilient, few-shot meta-offline RL algorithm combining offline RL using conservative Q-learning (CQL) and meta-learning using model-agnostic meta-learning (MAML). The proposed algorithm can train RL models using static offline datasets without any online interaction with the environments. In addition, with the aid of MAML, the proposed model can be scaled up to new unseen environments. We showcase the proposed algorithm for optimizing an unmanned aerial vehicle (UAV) 's trajectory and scheduling policy to minimize the age-of-information (AoI) and transmission power of limited-power devices. Numerical results show that the proposed few-shot meta-offline RL algorithm converges faster than baseline schemes, such as deep Q-networks and CQL. In addition, it is the only algorithm that can achieve optimal joint AoI and transmission power using an offline dataset with few shots of data points and is resilient to network failures due to unprecedented environmental changes.
Problem

Research questions and friction points this paper is trying to address.

Drone Learning
Adaptive Path Planning
Limited Data Environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline-Meta Learning
Drone Path Planning
Adaptability in New Environments
🔎 Similar Papers
No similar papers found.