Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Evaluating long-text generation quality faces challenges due to performance degradation of LLM-as-a-Judge when processing excessively long inputs. To address this, we propose a local-global hybrid contextual evaluation paradigm: fine-grained local scoring is first performed on text segments, followed by explicit modeling of global coherence. We further introduce an uncertainty-estimation–driven active learning algorithm that dynamically selects high-value samples for human annotation and integrates human feedback to enhance judgment consistency. This work is the first to systematically unify segmented evaluation, uncertainty-guided sampling, and human feedback injection into a single framework. On multiple long-text evaluation benchmarks, our method significantly outperforms state-of-the-art baselines: global consistency improves by 23.6%, annotation efficiency increases by 3.8×, and both local discrimination accuracy and robustness are concurrently enhanced.

Technology Category

Application Category

📝 Abstract
Assessing the quality of long-form, model-generated text is challenging, even with advanced LLM-as-a-Judge methods, due to performance degradation as input length increases. To address this issue, we propose a divide-and-conquer approach, which breaks down the comprehensive evaluation task into a series of localized scoring tasks, followed by a final global assessment. This strategy allows for more granular and manageable evaluations, ensuring that each segment of the text is assessed in isolation for both coherence and quality, while also accounting for the overall structure and consistency of the entire piece. Moreover, we introduce a hybrid in-context learning approach that leverages human annotations to enhance the performance of both local and global evaluations. By incorporating human-generated feedback directly into the evaluation process, this method allows the model to better align with human judgment. Finally, we develop an uncertainty-based active learning algorithm that efficiently selects data samples for human annotation, thereby reducing annotation costs in practical scenarios. Experimental results show that the proposed evaluation framework outperforms several representative baselines, highlighting the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Evaluating long-form generated text quality is challenging due to length-related performance drops
Proposing a hybrid local-global evaluation method with human feedback integration
Reducing annotation costs via uncertainty-based active learning for efficient human input
Innovation

Methods, ideas, or system contributions that make the work stand out.

Divide-and-conquer approach for localized scoring
Hybrid in-context learning with human annotations
Uncertainty-based active learning reduces annotation costs
🔎 Similar Papers
No similar papers found.
X
Xiaorong Wang
Beijing Jiaotong University
T
Ting Yang
Beijing University of Posts and Telecommunications
Z
Zhu Zhang
Tsinghua University
S
Shuo Wang
Tsinghua University
Z
Zihan Zhou
Xiamen University
Liner Yang
Liner Yang
Associate Professor, Beijing Language and Culture University
Artificial IntelligenceNatural Language Processing
Z
Zhiyuan Liu
Tsinghua University
Maosong Sun
Maosong Sun
Professor of Computer Science and Technology, Tsinghua University
Natural Language ProcessingArtificial IntelligenceSocial Computing