MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM mathematical reasoning evaluation relies excessively on static accuracy metrics, failing to expose latent defects in reasoning dynamics. Method: We propose MathBode—the first dynamic assessment framework for LLM capability diagnosis that imports control-theoretic Bode analysis. It models mathematical problems as systems, drives model responses with parameterized sinusoidal inputs, and fits the fundamental harmonic response to extract gain–phase frequency-response “fingerprints.” Contribution/Results: Evaluated across five closed-form mathematical problem families and symbolic computation benchmarks, MathBode reveals, for the first time in the frequency domain, pervasive low-pass characteristics and phase-lag phenomena in LLMs. It enables quantitative differentiation between reasoning fidelity and consistency, yields compact and reproducible evaluation protocols, and is fully open-sourced—including datasets and code.

Technology Category

Application Category

📝 Abstract
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G approx 1$, $φapprox 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing mathematical reasoning dynamics in LLMs using frequency-domain analysis
Revealing systematic low-pass behavior and phase lag in model responses
Providing interpretable metrics beyond accuracy for reasoning fidelity evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-domain analysis of LLM mathematical reasoning
Bode-style fingerprints with gain and phase metrics
Dynamic diagnostic protocol complementing standard benchmarks
🔎 Similar Papers
No similar papers found.