🤖 AI Summary
Existing visualization methods (e.g., t-SNE) for interpreting deep reinforcement learning (DRL) policies suffer from instability and reliance on manual annotation, hindering reliable semantic analysis of policy representations.
Method: We propose the first DRL architecture that integrates an online semantic clustering module directly into the training pipeline, combining feature dimensionality reduction (PCA or VAE) with streaming k-means to dynamically discover semantically coherent groupings of state-action representations during training.
Contribution/Results: Experiments in video game environments demonstrate that DRL policies intrinsically exhibit strong semantic clustering structure. Our method automatically uncovers policy hierarchy and the semantic organization of state representations without human supervision, significantly improving model diagnostic efficiency and the robustness of interpretability analysis. This advances principled, scalable, and training-aware explainability for DRL systems.
📝 Abstract
In this paper, we investigate the semantic clustering properties of deep reinforcement learning (DRL) for video games, enriching our understanding of the internal dynamics of DRL and advancing its interpretability. In this context, semantic clustering refers to the inherent capacity of neural networks to internally group video inputs based on semantic similarity. To achieve this, we propose a novel DRL architecture that integrates a semantic clustering module featuring both feature dimensionality reduction and online clustering. This module seamlessly integrates into the DRL training pipeline, addressing instability issues observed in previous t-SNE-based analysis methods and eliminating the necessity for extensive manual annotation of semantic analysis. Through experiments, we validate the effectiveness of the proposed module and the semantic clustering properties in DRL for video games. Additionally, based on these properties, we introduce new analytical methods to help understand the hierarchical structure of policies and the semantic distribution within the feature space.