LLM-PySC2: Starcraft II learning environment for Large Language Models

📅 2024-11-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Large language models (LLMs) face challenges in interfacing with PySC2’s full action space and lack native support for multi-agent (MA) coordination in StarCraft II. Method: We introduce the first RL environment enabling direct LLM integration with PySC2’s complete action set. Our approach features: (i) end-to-end LLM–PySC2 integration; (ii) an asynchronous MA interaction architecture optimized for LLMs, incorporating multimodal state encoding, Wikipedia-based knowledge injection, and structured instruction prompting to mitigate hallucination and improve collaboration efficiency; and (iii) a lightweight HTTP/JSON communication protocol with a dedicated action parser. Contribution/Results: Experiments demonstrate significant improvements in LLM performance on both macro-strategic planning and micro-tactical execution tasks. However, critical limitations—particularly decision instability—are revealed. The framework establishes a reproducible, scalable benchmark for LLM-driven real-time strategy decision-making.

Technology Category

Application Category

📝 Abstract

The tremendous potential has been demonstrated by large language models (LLMs) in intelligent decision-making problems, with unprecedented capabilities shown across diverse applications ranging from gaming AI systems to complex strategic planning frameworks. However, the StarCraft II platform, which has been widely adopted for validating decision-making algorithms in the past decade, has not yet provided substantial support for this emerging domain. To address issues that LLMs cannot interface with the hundreds of actions of the pysc2 backend and the lack of native support for multi-agent (MA) collaboration, we propose the LLM-PySC2 environment. This is the first environment that offers LLMs the complete pysc2 action space with sufficient multi-modal information and game Wiki knowledge. With an asynchronous query architecture, the environment efficiently interacts with LLMs that maintain a constant latency regardless of the scale of the agents' population. In the experiments, we evaluated LLMs' decision-making performance in both the macro-decision and micro-operation scenarios, with traditional StarCraft II Multi-Agent Challenge (SMAC) tasks and a series of new proposed. Results indicate that LLMs possess the potential to achieve victories in complex scenarios but cannot constantly generate correct decisions, especially in the recovered pysc2 action space and MA settings. Without task-relevant instructions, the pre-trained models suffer from issues such as hallucinations and inefficient collaboration. Our findings suggest that StarCraft II still challenges in the era of large models, revealing that there is a lot to do to develop an advanced LLM decision-making system, and the proposed LLM-PySC2 environment will support future development of LLM-based decision-making solutions.

Problem

Research questions and friction points this paper is trying to address.

LLMs lack interface with pysc2 backend actions

No native support for multi-agent collaboration in StarCraft II

Pre-trained models suffer from hallucinations and inefficient collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-PySC2 integrates pysc2 action space for LLMs

Asynchronous query architecture ensures constant latency

Provides multi-modal info and game Wiki knowledge

🔎 Similar Papers

No similar papers found.