๐ค AI Summary
This work addresses strategic reasoning in high-dimensional decision spaces by proposing a novel planning paradigm that eschews policy/value networks and explicit dynamical models: lightweight vector-arithmetic planning directly within a semantically aligned embedding space. The method constructs an evaluation embedding space via supervised contrastive learning, ensuring semantic similarity of outcomes correlates with Euclidean distance in the embedding space; introduces a global advantage vector to rank actions; and formulates planning as a vector alignment problem in the latent space. Experiments demonstrate that shallow search suffices to achieve strong adversarial performance in chess, significantly reducing planning computational overhead. The core contribution is the first decoupling of strategic planning into a geometric alignment problem in embedding spaceโyielding a scalable, interpretable pathway for autonomous reasoning in large language models.
๐ Abstract
Planning in high-dimensional decision spaces is increasingly being studied through the lens of learned representations. Rather than training policies or value heads, we investigate whether planning can be carried out directly in an evaluation-aligned embedding space. We introduce SOLIS, which learns such a space using supervised contrastive learning. In this representation, outcome similarity is captured by proximity, and a single global advantage vector orients the space from losing to winning regions. Candidate actions are then ranked according to their alignment with this direction, reducing planning to vector operations in latent space. We demonstrate this approach in chess, where SOLIS uses only a shallow search guided by the learned embedding to reach competitive strength under constrained conditions. More broadly, our results suggest that evaluation-aligned latent planning offers a lightweight alternative to traditional dynamics models or policy learning.