🤖 AI Summary
Existing research on large language models (LLMs) as autonomous agents and tool users remains fragmented and limited in architecture design, multi-agent coordination, tool integration, cognitive mechanism modeling, and evaluation frameworks. Method: This survey systematically analyzes 2023–2025 top-tier conference and journal publications using structured literature analysis, integrating prompt engineering and fine-tuning techniques to dissect LLM implementations of core cognitive capabilities—reasoning, planning, and memory. Contribution/Results: We identify three breakthrough directions—verifiable reasoning, self-improvement, and personalized customization—and distill ten concrete future research pathways. Further, we propose a unified evaluation framework covering 68 publicly available datasets, exposing critical gaps in current benchmarks regarding task generalization, dynamic adaptability, and causal attribution capability.
📝 Abstract
The pursuit of human-level artificial intelligence (AI) has significantly advanced the development of autonomous agents and Large Language Models (LLMs). LLMs are now widely utilized as decision-making agents for their ability to interpret instructions, manage sequential tasks, and adapt through feedback. This review examines recent developments in employing LLMs as autonomous agents and tool users and comprises seven research questions. We only used the papers published between 2023 and 2025 in conferences of the A* and A rank and Q1 journals. A structured analysis of the LLM agents' architectural design principles, dividing their applications into single-agent and multi-agent systems, and strategies for integrating external tools is presented. In addition, the cognitive mechanisms of LLM, including reasoning, planning, and memory, and the impact of prompting methods and fine-tuning procedures on agent performance are also investigated. Furthermore, we evaluated current benchmarks and assessment protocols and have provided an analysis of 68 publicly available datasets to assess the performance of LLM-based agents in various tasks. In conducting this review, we have identified critical findings on verifiable reasoning of LLMs, the capacity for self-improvement, and the personalization of LLM-based agents. Finally, we have discussed ten future research directions to overcome these gaps.