ReviewAgents: Bridging the Gap Between Human and AI-Generated Paper Reviews

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI-generated peer review comments significantly lag behind human reviewers in comprehensiveness, factual accuracy, and logical consistency. Method: This paper introduces ReviewAgents—a novel framework comprising (i) Review-CoT, the first structured reasoning dataset for review generation; (ii) a multi-role large language model (LLM) collaboration mechanism for peer review; and (iii) ReviewBench, a dedicated evaluation benchmark. The approach integrates related-paper-aware pretraining, chain-of-thought (CoT)-guided structured fine-tuning, and role-specialized multi-agent coordination. Contribution/Results: On ReviewBench, ReviewAgents substantially outperforms state-of-the-art LLMs across all dimensions—comprehensiveness, accuracy, and reasoning coherence—achieving review quality markedly closer to that of human experts. It thus bridges the performance gap between AI and human reviewers and establishes a verifiable, scalable paradigm for automated academic peer review.

Technology Category

Application Category

📝 Abstract
Academic paper review is a critical yet time-consuming task within the research community. With the increasing volume of academic publications, automating the review process has become a significant challenge. The primary issue lies in generating comprehensive, accurate, and reasoning-consistent review comments that align with human reviewers' judgments. In this paper, we address this challenge by proposing ReviewAgents, a framework that leverages large language models (LLMs) to generate academic paper reviews. We first introduce a novel dataset, Review-CoT, consisting of 142k review comments, designed for training LLM agents. This dataset emulates the structured reasoning process of human reviewers-summarizing the paper, referencing relevant works, identifying strengths and weaknesses, and generating a review conclusion. Building upon this, we train LLM reviewer agents capable of structured reasoning using a relevant-paper-aware training method. Furthermore, we construct ReviewAgents, a multi-role, multi-LLM agent review framework, to enhance the review comment generation process. Additionally, we propose ReviewBench, a benchmark for evaluating the review comments generated by LLMs. Our experimental results on ReviewBench demonstrate that while existing LLMs exhibit a certain degree of potential for automating the review process, there remains a gap when compared to human-generated reviews. Moreover, our ReviewAgents framework further narrows this gap, outperforming advanced LLMs in generating review comments.
Problem

Research questions and friction points this paper is trying to address.

Automating academic paper reviews to save time
Generating accurate and reasoning-consistent review comments
Bridging the gap between AI and human-generated reviews
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large language models for review generation
Introduces Review-CoT dataset for structured reasoning training
Develops ReviewAgents framework for multi-role review enhancement
🔎 Similar Papers
No similar papers found.
Xian Gao
Xian Gao
Shanghai Jiao Tong University
LLMMulti-modalAI for Education
Jiacheng Ruan
Jiacheng Ruan
Shanghai Jiao Tong University
Vision-language modelParameter-efficient fine-tuningMedical imageLight-weight model
J
Jingsheng Gao
Shanghai Jiao Tong University
T
Ting Liu
Shanghai Jiao Tong University
Y
Yuzhuo Fu
Shanghai Jiao Tong University