🤖 AI Summary
This study addresses the lack of automated, quantitative evaluation methods for assessing the impact of procedural content generation (PCG) on player experience in serious games. We propose the first modular PCG evaluation framework embedding a deep reinforcement learning (DRL) test agent. Specifically, a Proximal Policy Optimization (PPO)-driven agent acts as a proxy player within a card-based serious game simulator, dynamically executing comparative evaluations across multiple PCG variants (e.g., NPC generation) without human intervention. Our key contribution is the systematic integration of DRL agents into a closed-loop PCG evaluation pipeline, enabling environment adaptation and real-time performance measurement. Experimental results demonstrate that the DRL agent achieves 97% win rates on PCG versions 2 and 3—significantly outperforming version 1’s 94% (p = 0.0009)—while also accelerating policy convergence by 23%.
📝 Abstract
Serious Games (SGs) are nowadays shifting focus to include procedural content generation (PCG) in the development process as a means of offering personalized and enhanced player experience. However, the development of a framework to assess the impact of PCG techniques when integrated into SGs remains particularly challenging. This study proposes a methodology for automated evaluation of PCG integration in SGs, incorporating deep reinforcement learning (DRL) game testing agents. To validate the proposed framework, a previously introduced SG featuring card game mechanics and incorporating three different versions of PCG for nonplayer character (NPC) creation has been deployed. Version 1 features random NPC creation, while versions 2 and 3 utilize a genetic algorithm approach. These versions are used to test the impact of different dynamic SG environments on the proposed framework's agents. The obtained results highlight the superiority of the DRL game testing agents trained on Versions 2 and 3 over those trained on Version 1 in terms of win rate (i.e. number of wins per played games) and training time. More specifically, within the execution of a test emulating regular gameplay, both Versions 2 and 3 peaked at a 97% win rate and achieved statistically significant higher (p=0009) win rates compared to those achieved in Version 1 that peaked at 94%. Overall, results advocate towards the proposed framework's capability to produce meaningful data for the evaluation of procedurally generated content in SGs.