Assessing Code Understanding in LLMs

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited understanding of compiler-level semantics-preserving program transformations—e.g., copy propagation and constant folding—critical for reliable code reasoning. Method: We propose a formal-verification–based empirical evaluation framework for semantic equivalence judgment, leveraging LLVM and other compiler toolchains to automatically generate robust test cases and self-supervised training signals. Contribution/Results: Experiments reveal high failure rates: 41% without context and 29% even with simple generic context—exposing fundamental blind spots in deep code semantic modeling. To address this, we introduce the first LLM–compiler co-enhanced training paradigm, wherein compiler-generated semantic equivalence pairs explicitly reinforce model robustness. This work establishes a rigorous methodology for quantitatively assessing code understanding capabilities and provides a scalable, tool-integrated pathway toward trustworthy code AI.

Technology Category

Application Category

📝 Abstract
We present an empirical evaluation of Large Language Models in code understanding associated with non-trivial, semantic-preserving program transformations such as copy propagation or constant folding. Our findings show that LLMs fail to judge semantic equivalence in approximately 41% of cases when no context is provided and in 29% when given a simple generic context. To improve accuracy, we advocate integrating LLMs with code-optimization tools to enhance training and facilitate more robust program understanding.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to understand transformed code semantics
Identifying LLMs' failure rates in semantic equivalence judgments
Proposing integration with code-optimization tools for improved accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical evaluation of LLMs in code understanding
Integration with code-optimization tools for accuracy
Enhanced training for robust program understanding
🔎 Similar Papers
No similar papers found.
Cosimo Laneve
Cosimo Laneve
Professor of Computer Science, University of Bologna
Programming LanguagesAnalysis of Programs
A
Alvise Spano
DAIS, Ca’ Foscari University of Venice, Italy
D
Dalila Ressi
DAIS, Ca’ Foscari University of Venice, Italy
S
Sabina Rossi
DAIS, Ca’ Foscari University of Venice, Italy
Michele Bugliesi
Michele Bugliesi
Professor of Computer Science, Universita' Ca' Foscari Venezia
Static AnalysisProgram VerificationSecurityDistributed Systems