LLM-PySC2: Starcraft II learning environment for Large Language Models

📅 2024-11-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face challenges in interfacing with PySC2’s full action space and lack native support for multi-agent (MA) coordination in StarCraft II. Method: We introduce the first RL environment enabling direct LLM integration with PySC2’s complete action set. Our approach features: (i) end-to-end LLM–PySC2 integration; (ii) an asynchronous MA interaction architecture optimized for LLMs, incorporating multimodal state encoding, Wikipedia-based knowledge injection, and structured instruction prompting to mitigate hallucination and improve collaboration efficiency; and (iii) a lightweight HTTP/JSON communication protocol with a dedicated action parser. Contribution/Results: Experiments demonstrate significant improvements in LLM performance on both macro-strategic planning and micro-tactical execution tasks. However, critical limitations—particularly decision instability—are revealed. The framework establishes a reproducible, scalable benchmark for LLM-driven real-time strategy decision-making.

Technology Category

Application Category

📝 Abstract
The tremendous potential has been demonstrated by large language models (LLMs) in intelligent decision-making problems, with unprecedented capabilities shown across diverse applications ranging from gaming AI systems to complex strategic planning frameworks. However, the StarCraft II platform, which has been widely adopted for validating decision-making algorithms in the past decade, has not yet provided substantial support for this emerging domain. To address issues that LLMs cannot interface with the hundreds of actions of the pysc2 backend and the lack of native support for multi-agent (MA) collaboration, we propose the LLM-PySC2 environment. This is the first environment that offers LLMs the complete pysc2 action space with sufficient multi-modal information and game Wiki knowledge. With an asynchronous query architecture, the environment efficiently interacts with LLMs that maintain a constant latency regardless of the scale of the agents' population. In the experiments, we evaluated LLMs' decision-making performance in both the macro-decision and micro-operation scenarios, with traditional StarCraft II Multi-Agent Challenge (SMAC) tasks and a series of new proposed. Results indicate that LLMs possess the potential to achieve victories in complex scenarios but cannot constantly generate correct decisions, especially in the recovered pysc2 action space and MA settings. Without task-relevant instructions, the pre-trained models suffer from issues such as hallucinations and inefficient collaboration. Our findings suggest that StarCraft II still challenges in the era of large models, revealing that there is a lot to do to develop an advanced LLM decision-making system, and the proposed LLM-PySC2 environment will support future development of LLM-based decision-making solutions.
Problem

Research questions and friction points this paper is trying to address.

LLMs lack interface with pysc2 backend actions
No native support for multi-agent collaboration in StarCraft II
Pre-trained models suffer from hallucinations and inefficient collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-PySC2 integrates pysc2 action space for LLMs
Asynchronous query architecture ensures constant latency
Provides multi-modal info and game Wiki knowledge
🔎 Similar Papers
No similar papers found.
Z
Zongyuan Li
College of Artificial Intelligence, Nankai University
Y
Yanan Ni
Laboratory for Big Data and Decision, National University of Defense Technology
R
Runnan Qi
Laboratory for Big Data and Decision, National University of Defense Technology
L
Lumin Jiang
Laboratory for Big Data and Decision, National University of Defense Technology
C
Chang Lu
College of Artificial Intelligence, Nankai University
X
Xiaojie Xu
College of Artificial Intelligence, Nankai University
X
Xiangbei Liu
College of Artificial Intelligence, Nankai University
P
Pengfei Li
College of Artificial Intelligence, Nankai University
Y
Yunzheng Guo
College of Artificial Intelligence, Nankai University
Zhe Ma
Zhe Ma
College of Artificial Intelligence, Nankai University
X
Xian Guo
College of Artificial Intelligence, Nankai University
K
Kuihua Huang
Laboratory for Big Data and Decision, National University of Defense Technology
Xuebo Zhang
Xuebo Zhang
Ph. D, Professor, Institute of Robotics, Nankai Univeristy, China
Visual servoingmobile roboticsmotion planningSLAMgame AI