Beam Selection in ISAC using Contextual Bandit with Multi-modal Transformer and Transfer Learning

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address spectrum efficiency (SE) degradation in integrated sensing and communication (ISAC) systems caused by beam selection under complex indoor environments, this paper proposes a joint sensing-communication optimization framework. First, a multimodal Transformer is employed to capture deep cross-modal correlations among radar/communication signals and channel state information. Second, a multi-agent contextual bandit algorithm enables distributed, low-overhead beam selection. Third, transfer reinforcement learning is incorporated to accelerate convergence and enhance generalization across multi-user scenarios. The key innovation lies in the first integration of multimodal Transformers with contextual bandits for ISAC beam selection, alongside an end-to-end joint sensing-communication modeling mechanism. Experimental results demonstrate that, for single-user settings, the proposed method reduces SE regret by 49.6% compared to conventional deep reinforcement learning (DRL); under multi-user transfer learning, regret further decreases by 19.7%, significantly improving training efficiency.

Technology Category

Application Category

📝 Abstract
Sixth generation (6G) wireless technology is anticipated to introduce Integrated Sensing and Communication (ISAC) as a transformative paradigm. ISAC unifies wireless communication and RADAR or other forms of sensing to optimize spectral and hardware resources. This paper presents a pioneering framework that leverages ISAC sensing data to enhance beam selection processes in complex indoor environments. By integrating multi-modal transformer models with a multi-agent contextual bandit algorithm, our approach utilizes ISAC sensing data to improve communication performance and achieves high spectral efficiency (SE). Specifically, the multi-modal transformer can capture inter-modal relationships, enhancing model generalization across diverse scenarios. Experimental evaluations on the DeepSense 6G dataset demonstrate that our model outperforms traditional deep reinforcement learning (DRL) methods, achieving superior beam prediction accuracy and adaptability. In the single-user scenario, we achieve an average SE regret improvement of 49.6% as compared to DRL. Furthermore, we employ transfer reinforcement learning to reduce training time and improve model performance in multi-user environments. In the multi-user scenario, this approach enhances the average SE regret, which is a measure to demonstrate how far the learned policy is from the optimal SE policy, by 19.7% compared to training from scratch, even when the latter is trained 100 times longer.
Problem

Research questions and friction points this paper is trying to address.

Enhances beam selection in ISAC using multi-modal transformer.
Improves spectral efficiency in complex indoor environments.
Reduces training time with transfer reinforcement learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal transformer captures inter-modal relationships
Contextual bandit algorithm enhances beam selection
Transfer learning reduces training time significantly
🔎 Similar Papers