Agreement in Representation Space for Open-Ended Self-Consistency

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing self-consistency methods rely on exact matching, which limits their applicability to open-ended generation tasks such as code generation and text summarization. This work proposes Embedding-based Agreement (EBA), a novel consistency metric that conceptualizes self-consistency as a geometric property in the representation space of generated outputs. By clustering multiple sampled generations in an embedding space, EBA measures semantic consistency without requiring additional training or auxiliary evaluation models. EBA is the first approach to effectively extend self-consistency to open-ended generation, revealing a strong correlation between generation quality and geometric positioning within the embedding space. Experimental results demonstrate that EBA significantly outperforms random selection across mathematical reasoning, code generation, and summarization tasks, with consistent performance across diverse models and embedding spaces—indicating that generations located near cluster centers are more reliable.

📝 Abstract

Self-consistency improves LLM reasoning by sampling multiple outputs and selecting the most consistent answer, but existing formulations largely rely on exact matching and therefore remain limited to tasks with categorical outputs. In this work, we study self-consistency in open-ended generation tasks such as code synthesis and text summarization. We hypothesize that consistency can be understood as a geometric property of the generation space, where semantically compatible generations concentrate in similar regions of representation space. To study this hypothesis, we introduce Embedding-Based Agreement (EBA), a simple training-free operationalization that estimates agreement by clustering sampled generations in embedding space. Through experiments on mathematical reasoning, code generation, and summarization, we show that agreement in representation space provides a robust and scalable signal of self-consistency for open-ended tasks. In particular, EBA consistently outperforms random selection and exhibits more stable scaling behavior than recent selection approaches based on LLM evaluation or uncertainty estimation. We further show that these agreement signals remain stable across model families and embedding spaces, even with native hidden representations. Finally, our analysis shows that the geometric location occupied by sampled generations is strongly correlated with generation quality: generations concentrated near central regions of representation space tend to correspond to more reliable outputs, whereas peripheral generations are substantially less accurate. Overall, our findings support viewing self-consistency as a property of the geometric organization of sampled generations rather than exact symbolic overlap.

Problem

Research questions and friction points this paper is trying to address.

self-consistency

open-ended generation

representation space

agreement

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-consistency

representation space

embedding-based agreement