🤖 AI Summary
This work addresses the limited transferability of existing wireless foundation models, which often rely on masked signal reconstruction and consequently overemphasize low-level signal details. To overcome this, the authors propose LatentWave, a joint-embedding predictive architecture (JEPA) pretrained on diverse radio frequency spectrograms and channel state information. By predicting masked regions in a latent space, LatentWave learns generalizable representations through several key innovations: channel-wise block embeddings, random channel sampling, and task-aware mask geometry design. These features enable support for variable antenna counts and enhance generalization across heterogeneous wireless configurations, facilitating out-of-the-box transfer. Evaluated on four downstream tasks—RF signal classification, 5G NR localization, beam prediction, and LoS/NLoS classification—LatentWave consistently outperforms WavesFM and demonstrates that mask geometry critically influences downstream performance.
📝 Abstract
Wireless foundation models have emerged as a promising alternative to building separate models for each wireless task. However, existing approaches rely on masked input reconstruction, which can bias representations toward low-level signal details. In this paper, we propose LatentWave, a wireless foundation model pretrained using a Joint-Embedding Predictive Architecture (JEPA) on diverse wireless spectrograms and channel state information (CSI). By predicting masked regions in latent space, LatentWave learns representations that are more transferable out of the box across diverse downstream tasks. The proposed architecture employs per-channel patch embeddings with stochastic channel sampling during pretraining, allowing it to process variable antenna counts and improving usability across heterogeneous wireless configurations. We evaluate LatentWave on four downstream tasks: RF signal classification, 5G NR positioning, beam prediction, and LoS/NLoS classification, comparing against a masked-modeling baseline (WavesFM) pretrained on the same data. Additionally, we show that the masking geometry introduces a task-dependent inductive bias: frequency masking strongly favors channel-related tasks such as positioning and beam prediction, while region masking better preserves discriminability for signal classification.