🤖 AI Summary
To address the challenge of balancing high training overhead and low latency in millimeter-wave (mmWave) beam prediction, this paper pioneers the integration of large language models (LLMs) into beam prediction. We propose a vision–semantic collaborative reprogramming framework: RGB images capture user equipment spatial positions, while prompt-based reprogramming enables cross-modal alignment between visual-temporal features and the LLM’s semantic space. This mechanism significantly enhances few-shot generalization and robustness in dynamic environments. Evaluated on a real-world vehicle-infrastructure cooperative scenario, our method achieves top-1/top-3 beam prediction accuracy of 61.01% and 97.39%, respectively, with only 12.56% and 5.55% accuracy degradation over a 10-step temporal horizon. These results demonstrate the effectiveness of our approach under stringent low-overhead and low-latency constraints.
📝 Abstract
In this paper, we propose BeamLLM, a vision-aided millimeter-wave (mmWave) beam prediction framework leveraging large language models (LLMs) to address the challenges of high training overhead and latency in mmWave communication systems. By combining computer vision (CV) with LLMs' cross-modal reasoning capabilities, the framework extracts user equipment (UE) positional features from RGB images and aligns visual-temporal features with LLMs' semantic space through reprogramming techniques. Evaluated on a realistic vehicle-to-infrastructure (V2I) scenario, the proposed method achieves 61.01% top-1 accuracy and 97.39% top-3 accuracy in standard prediction tasks, significantly outperforming traditional deep learning models. In few-shot prediction scenarios, the performance degradation is limited to 12.56% (top-1) and 5.55% (top-3) from time sample 1 to 10, demonstrating superior prediction capability.