Introspection of Thought Helps AI Agents

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low reasoning efficiency and limited self-correction capability of large language models (LLMs) and multimodal LLMs (MLLMs) in natural language understanding, this paper proposes INoT—a self-introspective reasoning framework. INoT embeds executable code directly into prompts, enabling models to perform self-reflection, negation, and correction within a single forward pass—without post-training or additional parameters. The method integrates programmatic prompt engineering with multimodal dialogue design to realize an intrinsic, lightweight introspective reasoning mechanism. Evaluated across six benchmark tasks, INoT achieves an average accuracy improvement of 7.95% over strong baselines while reducing inference token consumption by 58.3% compared to the best-performing baseline. These results demonstrate INoT’s effectiveness, generalizability, and deployment efficiency.

Technology Category

Application Category

📝 Abstract
AI Agents rely on Large Language Models (LLMs) and Multimodal-LLMs (MLLMs) to perform interpretation and inference in text and image tasks without post-training, where LLMs and MLLMs play the most critical role and determine the initial ability and limitations of AI Agents. Usually, AI Agents utilize sophisticated prompt engineering and external reasoning framework to obtain a promising interaction with LLMs, e.g., Chain-of-Thought, Iteration of Thought and Image-of-Thought. However, they are still constrained by the inherent limitations of LLM in understanding natural language, and the iterative reasoning process will generate a large amount of inference cost. To this end, we propose a novel AI Agent Reasoning Framework with Introspection of Thought (INoT) by designing a new LLM-Read code in prompt. It enables LLM to execute programmatic dialogue reasoning processes following the code in prompt. Therefore, self-denial and reflection occur within LLM instead of outside LLM, which can reduce token cost effectively. Through our experiments on six benchmarks for three different tasks, the effectiveness of INoT is verified, with an average improvement of 7.95% in performance, exceeding the baselines. Furthermore, the token cost of INoT is lower on average than the best performing method at baseline by 58.3%. In addition, we demonstrate the versatility of INoT in image interpretation and inference through verification experiments.
Problem

Research questions and friction points this paper is trying to address.

Reducing LLM inference costs via programmatic introspection
Overcoming LLM limitations in natural language understanding
Improving multimodal task performance with code-based reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM-Read code for programmatic dialogue reasoning
Reduces token cost via internal self-denial and reflection
Improves performance by 7.95% across six benchmarks
🔎 Similar Papers
No similar papers found.
H
Haoran Sun
Yangtze Delta Region Institute (Huzhou), University of Electronic and Science Technology of China, Huzhou, Zhejiang, China; School of Information and Software Engineering, University of Electronic and Science Technology of China, Chengdu, Sichuan, China
Shaoning Zeng
Shaoning Zeng
Yangtze Delta Region Institute (Huzhou), UESTC
Pattern RecognitionComputer VisionData AugmentationAI Agents