🤖 AI Summary
To address the high energy consumption, low interpretability, and heavy computational dependency of large language models (LLMs), this paper proposes a memory-based language modeling (Memory-based LM) approach that approximates k-nearest neighbor (k-NN) classification to replace deep neural networks for efficient, environmentally sustainable, and interpretable next-token prediction. The method operates entirely on CPU, leveraging fast approximate nearest neighbor retrieval and the lightweight OLIFANT system to directly model token sequence patterns in memory—enabling strong memorization capacity and fully transparent decision-making. Experimental results demonstrate that Memory-based LM achieves accuracy comparable to GPT-2 and GPT-Neo on standard language modeling benchmarks, while reducing inference latency by 47% and cutting carbon emissions by 92%. These improvements significantly enhance both sustainability and practical deployability, offering a viable alternative to resource-intensive LLMs.
📝 Abstract
We present memory-based language modeling as an efficient, eco-friendly alternative to deep neural network-based language modeling. It offers log-linearly scalable next-token prediction performance and strong memorization capabilities. Implementing fast approximations of k-nearest neighbor classification, memory-based language modeling leaves a relatively small ecological footprint both in training and in inference mode, as it relies fully on CPUs and attains low token latencies. Its internal workings are simple and fully transparent. We compare our implementation of memory-based language modeling, OLIFANT, with GPT-2 and GPT-Neo on next-token prediction accuracy, estimated emissions and speeds, and offer some deeper analyses of the model.