Grammar and Gameplay-aligned RL for Game Description Generation with LLMs

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address prevalent syntax errors and low fidelity to game rules in natural language (NL)–to–Game Description Language (GDL) generation, this paper proposes a two-stage fine-tuning paradigm: first supervised fine-tuning (SFT) on large language models (LLMs), followed by dual-objective reinforcement learning (RL). We innovatively design jointly optimized rewards—(i) a syntax reward driven by a formal parser to enforce structural correctness, and (ii) a game-concept reward computed by a semantic alignment scorer to preserve rule semantics—both optimized concurrently via Proximal Policy Optimization (PPO). Evaluated across multiple GDL benchmarks, our method significantly outperforms the SFT-only baseline: syntax correctness improves by 27%, and fidelity to critical game concepts increases by 31%. To our knowledge, this is the first RL framework for NL→GDL generation that explicitly integrates structured syntactic constraints with deep semantic alignment.

Technology Category

Application Category

📝 Abstract
Game Description Generation (GDG) is the task of generating a game description written in a Game Description Language (GDL) from natural language text. Previous studies have explored generation methods leveraging the contextual understanding capabilities of Large Language Models (LLMs); however, accurately reproducing the game features of the game descriptions remains a challenge. In this paper, we propose reinforcement learning-based fine-tuning of LLMs for GDG (RLGDG). Our training method simultaneously improves grammatical correctness and fidelity to game concepts by introducing both grammar rewards and concept rewards. Furthermore, we adopt a two-stage training strategy where Reinforcement Learning (RL) is applied following Supervised Fine-Tuning (SFT). Experimental results demonstrate that our proposed method significantly outperforms baseline methods using SFT alone.
Problem

Research questions and friction points this paper is trying to address.

Generates game descriptions from natural language using LLMs.
Improves grammatical correctness and game concept fidelity.
Uses reinforcement learning after supervised fine-tuning for better results.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning fine-tunes LLMs for GDG
Grammar and concept rewards enhance model accuracy
Two-stage training combines SFT and RL effectively
🔎 Similar Papers
No similar papers found.