🤖 AI Summary
This work addresses the challenge of unstructured weight errors in AI models induced by quantum tunneling effects at advanced process nodes, which cannot be accurately modeled by conventional Gaussian noise assumptions. Leveraging the WKB approximation from first principles, the authors derive an error distribution that exhibits three key characteristics: affine mean shift, MSB-dominated hierarchical bit-wise variance, and inter-layer dependencies tied to the weight ℓ∞ norm and network Jacobian. Based on these insights, they propose Tunneling-Aware Compensation (TAC)—a retraining-free, inference-overhead-free algorithm that enables closed-form mean correction and layer-adaptive bit allocation. They further introduce a saturation ratio metric ρ* to predict performance gains. Experiments show that TAC recovers 95% of the original model accuracy under flip rates of 0.05–0.10, reduces ECC overhead by 3.4–33.6× compared to Uniform-MSP, and achieves up to a 24-percentage-point improvement over magnitude pruning under low bit budgets using the WKB-based scoring.
📝 Abstract
Transistor scaling is approaching a quantum-mechanical limit, as thin gate oxides induce electron leakage through quantum tunneling. Unlike conventional digital systems, AI inference can tolerate such errors provided their structure is modeled correctly. In this paper, we introduce quantum tunneling-aware machine learning (QTAML). We derive the deployment-time weight-error distribution from first principles using the Wentzel-Kramers-Brillouin (WKB) approximation and show that it has structure that generic Gaussian noise models miss: an exact affine mean drift, a per-bit variance hierarchy dominated by the most-significant bit, and a per-layer dependence on $\|W_\ell\|_\infty$ and the trained-network Jacobian. We package these three structural properties into a single deployment-time algorithm, Tunneling-Aware Compensation (TAC), that combines closed-form mean correction with an optimal layer-adaptive bit-budget allocation derived from the WKB variance decomposition. Across four convolutional architectures at $p_\mathrm{flip}$=0.10 and a transformer encoder at $p_\mathrm{flip}$=0.05, TAC reaches $95\%$ of clean accuracy with 3.4$\times$ to 33.6$\times$ less ECC overhead than Uniform-MSP, the natural baseline derived from the same physics. The closed-form saturation ratio $ρ^*$ predicts these gains in advance, and on heterogeneous architectures WKB-derived scoring outperforms magnitude-based allocation by up to 24 percentage points at small budgets. The algorithm requires no retraining, no labels, and no inference-time overhead. We also verify the WKB-derived distributional theorems to Monte Carlo precision. These results connect WKB tunneling physics with noise-aware deep learning and suggest a principled path toward hardware--software co-design beyond conventional scaling limits.