🤖 AI Summary
This paper addresses the fundamental limitation of large language models (LLMs)—finite factual memory capacity imposed by parameter count—and systematically investigates the theoretical advantages of tool augmentation (e.g., retrieval, API calls) over weight-based internal memory for factual recall. We propose the “tool-in-the-loop learning” paradigm, which leverages external knowledge retrieval to circumvent hard parameter constraints on memory capacity. Using circuit complexity analysis, controlled ablation experiments, and a novel tool-use instruction strategy during pretraining, we provide the first theoretical proof that tool augmentation enables asymptotically scalable knowledge retrieval. Empirical results demonstrate that tool-augmented models significantly outperform purely parametric models in both factual recall accuracy and out-of-distribution generalization. Our work establishes a rigorous theoretical foundation and practical framework for building LLMs with effectively unbounded knowledge boundaries.
📝 Abstract
Tool-augmented language models, equipped with retrieval, memory, or external APIs, are reshaping AI, yet their theoretical advantages remain underexplored. In this paper, we address this question by demonstrating the benefits of in-tool learning (external retrieval) over in-weight learning (memorization) for factual recall. We show that the number of facts a model can memorize solely in its weights is fundamentally limited by its parameter count. In contrast, we prove that tool-use enables unbounded factual recall via a simple and efficient circuit construction. These results are validated in controlled experiments, where tool-using models consistently outperform memorizing ones. We further show that for pretrained large language models, teaching tool-use and general rules is more effective than finetuning facts into memory. Our work provides both a theoretical and empirical foundation, establishing why tool-augmented workflows are not just practical, but provably more scalable.