🤖 AI Summary
This study addresses the challenge of learning compact and semantically meaningful representations of chess positions from continuous game sequences under unsupervised conditions. To this end, the authors propose a novel self-supervised learning framework that integrates concepts from Masked Autoencoders (MAE), Joint-Embedding Predictive Architecture (JEPA), and BERT. By predicting masked board states within a low-dimensional embedding space, the model effectively encodes positional semantics without relying on reinforcement learning or explicit move labels. This work represents the first application of a combined MAE–JEPA–BERT architecture to sequential board-game modeling, enabling the capture of piece movement logic purely through self-supervision. Experimental results demonstrate that the learned representation space naturally clusters into interpretable, chess-theoretic concepts, clearly reflects positional semantics, and exhibits the capacity to reason about legal moves.
📝 Abstract
In this paper, we introduce Representation Prediction via Autoencoding using Iterative Refinement (RePAIR) - a novel self-supervised representation learning architecture that synthesizes Masked Autoencoders (MAE), Joint Embedding Predictive Architectures (JEPA), and Bidirectional Encoder Representations from Transformers (BERT). We demonstrate how it can be used to encode objects in sequential data like consecutive chess positions into compact yet meaningful representations. The basic principle of the architecture is to mask large portions of a sequence of latent states, similar to BERT and MAE. Then, we apply a lightweight Predictor to the latent representations that repairs gaps in the sequence in a lower-dimensional embedding space akin to JEPA. Our experiments in the domain of chess show that the Encoder refines the board representations such that meaningful chess concepts emerge clustered in the latent space. Furthermore, reconstructions of the masked board states show that the model is able to reason about the piece movements without relying on costly reinforcement learning methods. Lastly, we find that the resulting representation space allows for quick and intuitive dissections of chess games by observing the game path trajectories in this semantically rich space.