Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation

📅 2024-09-11
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of effectively integrating human expertise with large language model (LLM) capabilities in checklist-based text evaluation. To this end, we propose InteractEval—a novel framework that systematically fuses human and LLM think-aloud protocols to construct a multidimensional assessment scheme targeting coherence, fluency, consistency, and relevance. Methodologically, InteractEval employs collaborative annotation, attribute-level fusion, checklist-driven modeling, and human–AI co-reasoning: humans provide fine-grained internal quality judgments, while LLMs augment external alignment and divergent generation. Experiments demonstrate that InteractEval significantly outperforms both fully human and fully LLM-based baselines across all four dimensions. Furthermore, open-sourced implementation confirms that human–AI collaboration simultaneously improves evaluation accuracy and coverage breadth.

Technology Category

Application Category

📝 Abstract
This study introduces extbf{InteractEval}, a framework that integrates human expertise and Large Language Models (LLMs) using the Think-Aloud (TA) method to generate attributes for checklist-based text evaluation. By combining human flexibility and reasoning with LLM consistency, InteractEval outperforms traditional non-LLM-based and LLM-based baselines across four distinct dimensions, consisting of Coherence, Fluency, Consistency, and Relevance. The experiment also investigates the effectiveness of the TA method, showing that it promotes divergent thinking in both humans and LLMs, leading to the generation of a wider range of relevant attributes and enhance text evaluation performance. Comparative analysis reveals that humans excel at identifying attributes related to internal quality (Coherence and Fluency), but LLMs perform better at those attributes related to external alignment (Consistency and Relevance). Consequently, leveraging both humans and LLMs together produces the best evaluation outcomes. In other words, this study emphasizes the necessity of effectively combining humans and LLMs in an automated checklist-based text evaluation framework. The code is available at extbf{url{https://github.com/BBeeChu/InteractEval.git}}.
Problem

Research questions and friction points this paper is trying to address.

Combining human and LLM insights for text evaluation.
Enhancing text evaluation through Think-Aloud method.
Integrating human flexibility with LLM consistency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines human and LLM Think-Aloud
Enhances text evaluation performance
Leverages human and LLM strengths
🔎 Similar Papers
No similar papers found.
S
SeongYeub Chu
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
J
JongWoo Kim
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
M
Mun Yong
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea