๐ค AI Summary
This work addresses the challenge of assessing holistic coherence in long texts. We propose BBScore, a reference-free, training-free metric that requires no human-written references or model fine-tuning. Its core innovation lies in the first application of Brownian bridge stochastic processes to text modeling: sentence embeddings are treated as points along a continuous path, and inter-sentence sequential coordination is quantified via path deviationโthereby unifying local semantic cohesion and global thematic consistency. BBScore relies solely on off-the-shelf sentence embeddings and a lightweight classification head, enabling zero-shot, end-to-end evaluation without parameter optimization. Experiments demonstrate that BBScore matches state-of-the-art supervised methods in human coherence judgment; robustly discriminates human-authored texts from outputs of diverse large language models (LLMs); and generalizes effectively across domains in identifying LLM-specific writing styles.
๐ Abstract
Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts.
In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BB Score," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks.
We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by large language models within specific domains. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to various large language models, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.