General Modular Harness for LLM Agents in Multi-Turn Gaming Environments

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the weak generalization and heavy reliance on domain-specific engineering exhibited by large language models (LLMs) and vision-language models (VLMs) in multi-turn interactive game environments. We propose the first general-purpose modular framework that decouples perception, memory, and reasoning into independent, interchangeable components—enabling plug-and-play integration of arbitrary LLM or VLM backbones without task-specific customization. Evaluated uniformly across classic (e.g., Zork) and modern (e.g., ALFWorld, VoxSim) game benchmarks, our framework reveals systematic component contributions: memory dominates performance gains in long-horizon puzzles, while perception is critical under high visual interference. Experiments demonstrate consistent outperformance over end-to-end baselines across diverse tasks, significantly improving robustness and adaptability in dynamic, interactive settings. The framework establishes an interpretable, scalable architectural paradigm for general embodied intelligence.

Technology Category

Application Category

📝 Abstract
We introduce a modular harness design for LLM agents that composes of perception, memory, and reasoning components, enabling a single LLM or VLM backbone to tackle a wide spectrum of multi turn gaming environments without domain-specific engineering. Using classic and modern game suites as low-barrier, high-diversity testbeds, our framework provides a unified workflow for analyzing how each module affects performance across dynamic interactive settings. Extensive experiments demonstrate that the harness lifts gameplay performance consistently over un-harnessed baselines and reveals distinct contribution patterns, for example, memory dominates in long-horizon puzzles while perception is critical in vision noisy arcades. These findings highlight the effectiveness of our modular harness design in advancing general-purpose agent, given the familiarity and ubiquity of games in everyday human experience.
Problem

Research questions and friction points this paper is trying to address.

Design modular harness for LLM agents in gaming environments
Analyze module impact on performance in dynamic settings
Improve gameplay performance over un-harnessed baselines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular harness with perception, memory, reasoning
Unified workflow for multi-turn gaming analysis
Boosts performance in diverse game environments
🔎 Similar Papers
No similar papers found.
Y
Yuxuan Zhang
Halıcıoğlu Data Science Institute (HDSI), University of California San Diego, La Jolla, CA, USA
H
Haoyang Yu
Halıcıoğlu Data Science Institute (HDSI), University of California San Diego, La Jolla, CA, USA
Lanxiang Hu
Lanxiang Hu
University of California, San Diego
Machine LearningDistributed SystemsEmbedded Systems
Haojian Jin
Haojian Jin
University of California San Diego
Human-Computer InteractionUbiquitous ComputingSecurity & PrivacyMobile Computing
H
Hao Zhang