How Big Should a Wireless Foundation Model Be?

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

This work addresses the mismatch between model parameter count and the physical characteristics of wireless channels by proposing a scaling framework grounded in the nonlinear manifold dimension of the channel (dNL). For the first time, Maxwell’s equations and physical constraints from scatterers are incorporated into model-scale analysis. The framework demonstrates that channel geometry—not merely parameter count—governs model performance, establishing a wireless AI scaling law bottlenecked by dNL. Combined with pilot-aided test-time training (TTT), a 12M-parameter dynamic model surpasses a 96M-parameter static counterpart by 9.9 dB in NMSE and 7.6 dB in MCM under NTN channels with dNL ≈ 14. Performance gains diminish sharply beyond 30M parameters and asymptotically approach random behavior above 70M parameters.

📝 Abstract

Wireless foundation models are rapidly emerging as a key enabler of AI-native communication systems, yet a fundamental question remains unanswered: how large should these models be? We present a principled, physics-grounded answer, showing that the intrinsic dimensionality (dNL, the nonlinear manifold dimension of the channel) acts as the fundamental bottleneck, defining the scaling ceiling once a data-sufficient regime is reached. This dimensionality is not a design choice but a physical constraint: Maxwell's equations, finite scatterers, and antenna aperture inherently constrain wireless propagation environments to a limited number of degrees of freedom -- spanning 5-35 across both real-world OTA measurements and 3GPP-standardized channel models we evaluate -- orders of magnitude below the ~1,000-dimensional semantic space of language. As a consequence, we propose a scaling framework for wireless AI: taking NTN satellite channels as a representative case (dNL ~= 14), scaling gains diminish rapidly beyond ~30 million parameters, entering a stochastic asymptote above 70M where a further 1.6x increase (96M->150M) yields only 0.52 dB. Beyond this ceiling, inference-time adaptation via pilot-aided test-time training (TTT) is far more effective: a compact 12M-parameter model surpasses a static 96M model by 9.9 dB (NMSE, SNR = 20 dB) / 7.6 dB (MCM, SNR = 10 dB) at one-eighth the parameters. With dNL distributions validated across real-world indoor massive MIMO measurements, our scaling laws and TTT gains are demonstrated through NTN satellite simulations, reframing wireless AI design: channel geometry -- not model size -- fundamentally governs the scaling laws of physical-layer wireless AI.

Problem

Research questions and friction points this paper is trying to address.

wireless foundation models

model scaling

intrinsic dimensionality

channel geometry

AI-native communication

Innovation

Methods, ideas, or system contributions that make the work stand out.

wireless foundation model

intrinsic dimensionality

scaling law