Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Evaluating long-text generation quality faces challenges due to performance degradation of LLM-as-a-Judge when processing excessively long inputs. To address this, we propose a local-global hybrid contextual evaluation paradigm: fine-grained local scoring is first performed on text segments, followed by explicit modeling of global coherence. We further introduce an uncertainty-estimation–driven active learning algorithm that dynamically selects high-value samples for human annotation and integrates human feedback to enhance judgment consistency. This work is the first to systematically unify segmented evaluation, uncertainty-guided sampling, and human feedback injection into a single framework. On multiple long-text evaluation benchmarks, our method significantly outperforms state-of-the-art baselines: global consistency improves by 23.6%, annotation efficiency increases by 3.8×, and both local discrimination accuracy and robustness are concurrently enhanced.

Technology Category

Application Category

📝 Abstract

Assessing the quality of long-form, model-generated text is challenging, even with advanced LLM-as-a-Judge methods, due to performance degradation as input length increases. To address this issue, we propose a divide-and-conquer approach, which breaks down the comprehensive evaluation task into a series of localized scoring tasks, followed by a final global assessment. This strategy allows for more granular and manageable evaluations, ensuring that each segment of the text is assessed in isolation for both coherence and quality, while also accounting for the overall structure and consistency of the entire piece. Moreover, we introduce a hybrid in-context learning approach that leverages human annotations to enhance the performance of both local and global evaluations. By incorporating human-generated feedback directly into the evaluation process, this method allows the model to better align with human judgment. Finally, we develop an uncertainty-based active learning algorithm that efficiently selects data samples for human annotation, thereby reducing annotation costs in practical scenarios. Experimental results show that the proposed evaluation framework outperforms several representative baselines, highlighting the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Evaluating long-form generated text quality is challenging due to length-related performance drops

Proposing a hybrid local-global evaluation method with human feedback integration

Reducing annotation costs via uncertainty-based active learning for efficient human input

Innovation

Methods, ideas, or system contributions that make the work stand out.

Divide-and-conquer approach for localized scoring

Hybrid in-context learning with human annotations

Uncertainty-based active learning reduces annotation costs

🔎 Similar Papers

No similar papers found.

Authors to Follow