🤖 AI Summary
To address the multi-task synergy requirements of communication, sensing, and localization in 6G networks, this paper proposes WavesFM—the first unified wireless foundation model. Methodologically, WavesFM introduces: (1) a novel fully parameter-shared architecture supporting multimodal radio inputs—including spectrograms, channel state information (CSI), and in-phase/quadrature (IQ) samples; (2) a Vision Transformer (ViT)-based backbone coupled with OFDM resource grid modeling, augmented with task-specific MLP heads and LoRA-based efficient fine-tuning; and (3) empirical validation that pretraining delivers dual benefits—accelerated convergence and improved performance—across diverse wireless downstream tasks. Evaluated on 5G NR localization, MIMO-OFDM channel estimation, human activity sensing, and RF signal classification, WavesFM achieves superior accuracy over task-specific supervised models while sharing 80% of parameters, reducing training time by up to 5×, and maintaining high efficiency and low computational overhead.
📝 Abstract
This paper introduces WavesFM, a novel Wireless Foundation Model (WFM) framework, capable of supporting a wide array of communication, sensing, and localization tasks. Our proposed architecture combines a shared Vision Transformer (ViT) backbone with task-specific multi-layer perceptron (MLP) heads and incorporates Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. This design promotes full parameter sharing across tasks, significantly reducing the computational and memory footprint without sacrificing performance. The model processes both image-like wireless modalities, such as spectrograms and channel state information (CSI), and in-phase and quadrature (IQ) signals arranged as orthogonal frequency-division multiplexing (OFDM) resource grids. We demonstrate the strong generalization capabilities of WavesFM through extensive experiments on four downstream tasks: Fifth Generation New Radio (5G NR) positioning; multiple-input multiple-output OFDM (MIMO-OFDM) channel estimation; human activity sensing; and radio-frequency (RF) signal classification. Compared to supervised baselines trained individually, our approach achieves superior performance while sharing 80% of its parameters across tasks. Furthermore, we show that pretraining on domain-relevant data not only boosts performance but also accelerates convergence, reducing training time by up to 5x. These results demonstrate that our unified WFM can support diverse tasks and deliver significant gains in both performance and efficiency, highlighting the transformative potential of foundation models to drive AI-native paradigms in future sixth-generation (6G) networks.