🤖 AI Summary
Existing RSSI estimation methods suffer from high feedback overhead, measurement instability, or reliance on auxiliary sensors and specialized hardware, hindering robust active link adaptation in non-line-of-sight (NLoS) scenarios. This work proposes MulViT-TF, the first purely vision-based multi-view RSSI estimation framework that operates without any auxiliary inputs. Leveraging distributed cameras to capture multi-view images, the approach employs Vision Transformers to extract visual features and introduces a lightweight cross-view attention mechanism for end-to-end feature fusion and regression. Experimental results in two indoor environments demonstrate that, compared to the best single-view baseline, MulViT-TF reduces RMSE by up to 26.3%, improves 3 dB error coverage by 13.8 percentage points, and achieves lower model complexity and computational overhead.
📝 Abstract
Received Signal Strength Indicator (RSSI) estimation is essential for wireless link management, yet conventional feedback-based approaches incur uplink overhead, suffer from measurement instability, and are subject to inherent feedback loop latency, rendering proactive adaptation infeasible. Although vision-based approaches have been explored, existing methods remain limited by hardware dependency or auxiliary inputs, and lack the spatial diversity needed to resolve camera-side NLoS conditions. To address these limitations, we propose MulViT-TF, a vision-only RSSI estimation framework that exploits distributed multi-view observations through Transformer-based fusion, achieving complementary spatial coverage without any auxiliary sensing inputs. Experimental results across two distinct indoor scenes demonstrate that MulViT-TF achieves RMSE reductions of up to 26.3% and improves the 3dB error coverage by up to 13.8 percentage points over the best-performing single-view baseline, while using fewer FLOPs and parameters.