🤖 AI Summary
This work addresses the significant performance degradation of conventional wireless localization methods in complex multipath and non-line-of-sight environments, as well as the limited generalization of existing learning-based approaches, which often require costly retraining to adapt to new base station configurations or unseen environments. To overcome these challenges, the authors propose RA-LWLM, a novel framework that, for the first time, integrates retrieval-augmented context learning into wireless localization. RA-LWLM leverages a frozen wireless foundation model to encode channel state information, retrieves relevant environmental context from an external fingerprint database via vector search, and employs a Mixture-of-Experts (MoE) Transformer to dynamically model query-dependent spatial relationships. Without any fine-tuning, the method achieves high-precision cross-scenario localization and consistently outperforms end-to-end and foundation-model baselines across diverse base station layouts and heterogeneous environments, both in seen and unseen scenarios.
📝 Abstract
Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.