🤖 AI Summary
This study investigates the capacity of small Transformer models to learn number-theoretic functions—specifically, the Möbius function μ(n) and the squarefree indicator μ²(n). Method: We employ a supervised learning framework trained on integer sequences and apply iterative interpretability analysis—including linear probe classifiers and feature visualization—to uncover internal computational mechanisms. Contribution/Results: We discover, for the first time, that the model implicitly constructs prime factorization structure during training; its decision logic is invertible and provably aligns with classical number-theoretic principles. Empirically, the model significantly outperforms random baselines on unseen integers, and its generalization remains robust as the underlying number-theoretic structure increases in complexity. This work provides the first empirically verifiable and interpretable demonstration of neural networks encoding elementary number theory, establishing a rigorous bridge between deep learning and analytic number theory.
📝 Abstract
Building on work of Charton, we train small transformer models to calculate the M""obius function $mu(n)$ and the squarefree indicator function $mu^2(n)$. The models attain nontrivial predictive power. We then iteratively train additional models to understand how the model functions, ultimately finding a theoretical explanation.