Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit substantially weaker performance than humans on cross-lingual mathematical reasoning in digital systems, yet the underlying causes remain unclear. Method: This work systematically disentangles linguistic structure of numerals—such as digit-order conventions (e.g., tens/units ordering) and compounding rules—from arithmetic semantics, introducing a controlled multilingual numeral experiment and a symbolic masking ablation framework. It further proposes a dedicated cross-lingual mathematical reasoning benchmark. Contribution/Results: Empirical analysis reveals that LLMs critically depend on explicit arithmetic operators (e.g., “+”) and fail to robustly infer compositional numeral rules under operator-free, implicit structural contexts. Their fundamental limitation lies in the inability to induce latent arithmetic structure embedded in linguistic numerals. This study uncovers a critical deficiency in current LLMs’ joint language–mathematical representation: insufficient abstraction over symbolic numeral syntax and semantics. It provides both theoretical insight into this representational gap and empirically testable pathways for advancing multilingual symbolic reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Across languages, numeral systems vary widely in how they construct and combine numbers. While humans consistently learn to navigate this diversity, large language models (LLMs) struggle with linguistic-mathematical puzzles involving cross-linguistic numeral systems, which humans can learn to solve successfully. We investigate why this task is difficult for LLMs through a series of experiments that untangle the linguistic and mathematical aspects of numbers in language. Our experiments establish that models cannot consistently solve such problems unless the mathematical operations in the problems are explicitly marked using known symbols ($+$, $ imes$, etc, as in"twenty + three"). In further ablation studies, we probe how individual parameters of numeral construction and combination affect performance. While humans use their linguistic understanding of numbers to make inferences about the implicit compositional structure of numerals, LLMs seem to lack this notion of implicit numeral structure. We conclude that the ability to flexibly infer compositional rules from implicit patterns in human-scale data remains an open challenge for current reasoning models.
Problem

Research questions and friction points this paper is trying to address.

Investigating linguistic-mathematical reasoning gaps in multilingual language models
Analyzing LLMs' struggles with cross-linguistic numeral system puzzles
Exploring models' inability to infer implicit numeral compositional rules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit mathematical symbols improve LLM performance
Probing numeral construction parameters affects outcomes
LLMs lack implicit numeral structure understanding
🔎 Similar Papers
No similar papers found.
A
Antara R. Bhattacharya
Computer Science Department, Harvard University
I
Isabel Papadimitriou
Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University
Kathryn Davidson
Kathryn Davidson
Professor of Linguistics, Harvard University
Formal semantics/pragmaticsExperimental semanticsSign languagesCognitive Science
David Alvarez-Melis
David Alvarez-Melis
Harvard University & Microsoft Research
Machine LearningOptimal TransportNatural Language ProcessingInterpretability