Semantic Content Determines Algorithmic Performance

๐Ÿ“… 2026-01-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the susceptibility of large language models (LLMs) to input semantics when performing algorithmic tasks that should be semantically agnostic, such as counting. Challenging the fundamental principle that algorithmic behavior ought to be independent of semantic content, the study introduces WhatCountsโ€”an atomic counting benchmark designed to isolate semantic effects. Through controlled ablation studies and comparisons across semantic categories, while eliminating confounding factors like reasoning complexity and prompt interference, the authors systematically demonstrate for the first time that mainstream LLMs exhibit substantial implicit dependence on input semantics. Experimental results reveal counting accuracy disparities exceeding 40% between semantic classes (e.g., cities vs. chemicals), and show that fine-tuning can unpredictably alter this dependency. These findings indicate that current models merely approximate, rather than genuinely possess, general-purpose algorithmic capabilities.

Technology Category

Application Category

๐Ÿ“ Abstract
Counting should not depend on what is being counted; more generally, any algorithm's behavior should be invariant to the semantic content of its arguments. We introduce WhatCounts to test this property in isolation. Unlike prior work that conflates semantic sensitivity with reasoning complexity or prompt variation, WhatCounts is atomic: count items in an unambiguous, delimited list with no duplicates, distractors, or reasoning steps for different semantic types. Frontier LLMs show over 40% accuracy variation depending solely on what is being counted - cities versus chemicals, names versus symbols. Controlled ablations rule out confounds. The gap is semantic, and it shifts unpredictably with small amounts of unrelated fine-tuning. LLMs do not implement algorithms; they approximate them, and the approximation is argument-dependent. As we show with an agentic example, this has implications beyond counting: any LLM function may carry hidden dependencies on the meaning of its inputs.
Problem

Research questions and friction points this paper is trying to address.

semantic content
algorithmic invariance
large language models
counting task
input dependency
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic sensitivity
algorithmic invariance
large language models
WhatCounts benchmark
input-dependent approximation
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Martino R'ios-Garc'ia
Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany
Nawaf Alampara
Nawaf Alampara
PhD Researcher, Friedrich Schiller University Jena
machine learningai4scienceaccelerating researchcomputational material science
K
K. Jablonka
Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany; Center for Energy and Environmental Chemistry Jena (CEEC Jena), Friedrich Schiller University Jena, Philosophenweg 7a, 07743 Jena, Germany; HIPOLE Jena (Helmholtz Institute for Polymers in Energy Applications Jena), Lessingstrasse 12-14, 07743 Jena, Germany; Jena Center for Soft Matter (JCSM), Friedrich Schiller University Jena, Philosophenweg 7, 07743 Jena, Germany