Sequential Enumeration in Large Language Models

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Large language models (LLMs) exhibit fundamental weaknesses in sequence counting and systematic enumeration, failing to spontaneously execute symbolic-like counting procedures. Method: We conduct the first systematic evaluation of diverse state-of-the-art LLMs—closed-source, open-source, and reasoning-augmented variants—on letter- and word-sequence naming and generation tasks. Our analysis integrates prompt engineering, chain-of-thought prompting, model-scale ablation, and dynamic embedding trajectory tracking to identify conditions for counting strategy emergence and characterize numerical representation mechanisms. Results: No model initiates counting autonomously; partial success occurs only under explicit prompting. Counting capability does not scale monotonically with parameter count. Numerical information is encoded nonlinearly and task-dependently in embedding space. These findings expose a core limitation in combinatorial generalization and provide critical empirical evidence for neuro-symbolic integration.

Technology Category

Application Category

📝 Abstract

Reliably counting and generating sequences of items remain a significant challenge for neural networks, including Large Language Models (LLMs). Indeed, although this capability is readily handled by rule-based symbolic systems based on serial computation, learning to systematically deploy counting procedures is difficult for neural models, which should acquire these skills through learning. Previous research has demonstrated that recurrent architectures can only approximately track and enumerate sequences of events, and it remains unclear whether modern deep learning systems, including LLMs, can deploy systematic counting procedures over sequences of discrete symbols. This paper aims to fill this gap by investigating the sequential enumeration abilities of five state-of-the-art LLMs, including proprietary, open-source, and reasoning models. We probe LLMs in sequential naming and production tasks involving lists of letters and words, adopting a variety of prompting instructions to explore the role of chain-of-thought in the spontaneous emerging of counting strategies. We also evaluate open-source models with the same architecture but increasing size to see whether the mastering of counting principles follows scaling laws, and we analyze the embedding dynamics during sequential enumeration to investigate the emergent encoding of numerosity. We find that some LLMs are indeed capable of deploying counting procedures when explicitly prompted to do so, but none of them spontaneously engage in counting when simply asked to enumerate the number of items in a sequence. Our results suggest that, despite their impressive emergent abilities, LLMs cannot yet robustly and systematically deploy counting procedures, highlighting a persistent gap between neural and symbolic approaches to compositional generalization.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with reliable sequential counting and generation

It investigates if LLMs can systematically count discrete symbol sequences

The study probes LLMs' spontaneous use of counting strategies in enumeration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probing LLMs with sequential naming and production tasks

Evaluating models of increasing size for scaling law effects

Analyzing embedding dynamics to study emergent numerosity encoding

🔎 Similar Papers

No similar papers found.