VISUALCODER: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

📅 2024-10-30

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited capability in dynamic execution reasoning tasks—such as program behavior prediction, bug detection, and runtime output generation—due to their inherent difficulty in modeling control flow and execution semantics. To address this, we propose the first code–control flow graph (CFG) multimodal Chain-of-Thought (CoT) framework, which aligns source code snippets with their corresponding visualized CFGs to enable fine-grained cross-modal reasoning. Our approach introduces a novel cross-modal referencing mechanism that explicitly enforces consistency between textual reasoning traces and visual execution paths, thereby overcoming fundamental limitations of text-only CoT in dynamic program modeling. Integrating CFG visualization encoding, multimodal fine-tuning, and tailored prompt engineering, our method achieves significant improvements across multiple code execution benchmarks—enhancing behavior prediction accuracy, error localization precision, and output generation quality—outperforming state-of-the-art unimodal CoT approaches.

Technology Category

Application Category

📝 Abstract

Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought (CoT) reasoning with a visual Control Flow Graph (CFG). By aligning code snippets with their corresponding CFGs, VisualCoder provides deeper insights into execution flows. We address challenges in multimodal CoT integration through a reference mechanism, ensuring consistency between code and its execution path, thereby improving performance in program behavior prediction, error detection, and output generation.

Problem

Research questions and friction points this paper is trying to address.

Enhancing code reasoning in large language models

Integrating multimodal Chain-of-Thought reasoning with CFGs

Improving program behavior prediction and error detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Chain-of-Thought

Visual Control Flow Graph

Code Execution Reasoning

🔎 Similar Papers

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?