🤖 AI Summary
Existing approaches for detecting UAV flight states—hovering, cruising, ascending, and transitioning—in dynamic environments suffer from poor robustness, weak generalization under small-sample conditions, and high computational overhead. To address these challenges, this paper proposes a novel temporal classification framework integrating Transformer encoders, conditional Generative Adversarial Networks (cGANs), and Multi-Instance Learning (MIL). The Transformer captures long-range temporal dependencies; the cGAN synthesizes high-fidelity telemetry data to alleviate data scarcity; and MIL enables adaptive focus on discriminative time segments, thereby suppressing noise, facilitating feature selection, and enhancing model interpretability. Evaluated on DroneDetect and DroneRF datasets, the method achieves 96.5% and 98.6% classification accuracy, respectively—substantially outperforming state-of-the-art methods. Moreover, it incurs low computational cost and supports cross-platform deployment.
📝 Abstract
Unmanned Aerial Vehicles (UAVs) are increasingly used in surveillance, logistics, agriculture, disaster management, and military operations. Accurate detection and classification of UAV flight states, such as hovering, cruising, ascending, or transitioning, which are essential for safe and effective operations. However, conventional time series classification (TSC) methods often lack robustness and generalization for dynamic UAV environments, while state of the art(SOTA) models like Transformers and LSTM based architectures typically require large datasets and entail high computational costs, especially with high-dimensional data streams. This paper proposes a novel framework that integrates a Transformer-based Generative Adversarial Network (GAN) with Multiple Instance Locally Explainable Learning (MILET) to address these challenges in UAV flight state classification. The Transformer encoder captures long-range temporal dependencies and complex telemetry dynamics, while the GAN module augments limited datasets with realistic synthetic samples. MIL is incorporated to focus attention on the most discriminative input segments, reducing noise and computational overhead. Experimental results show that the proposed method achieves superior accuracy 96.5% on the DroneDetect dataset and 98.6% on the DroneRF dataset that outperforming other SOTA approaches. The framework also demonstrates strong computational efficiency and robust generalization across diverse UAV platforms and flight states, highlighting its potential for real-time deployment in resource constrained environments.