ReviewAgents: Bridging the Gap Between Human and AI-Generated Paper Reviews

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

AI-generated peer review comments significantly lag behind human reviewers in comprehensiveness, factual accuracy, and logical consistency. Method: This paper introduces ReviewAgents—a novel framework comprising (i) Review-CoT, the first structured reasoning dataset for review generation; (ii) a multi-role large language model (LLM) collaboration mechanism for peer review; and (iii) ReviewBench, a dedicated evaluation benchmark. The approach integrates related-paper-aware pretraining, chain-of-thought (CoT)-guided structured fine-tuning, and role-specialized multi-agent coordination. Contribution/Results: On ReviewBench, ReviewAgents substantially outperforms state-of-the-art LLMs across all dimensions—comprehensiveness, accuracy, and reasoning coherence—achieving review quality markedly closer to that of human experts. It thus bridges the performance gap between AI and human reviewers and establishes a verifiable, scalable paradigm for automated academic peer review.

Technology Category

Application Category

📝 Abstract

Academic paper review is a critical yet time-consuming task within the research community. With the increasing volume of academic publications, automating the review process has become a significant challenge. The primary issue lies in generating comprehensive, accurate, and reasoning-consistent review comments that align with human reviewers' judgments. In this paper, we address this challenge by proposing ReviewAgents, a framework that leverages large language models (LLMs) to generate academic paper reviews. We first introduce a novel dataset, Review-CoT, consisting of 142k review comments, designed for training LLM agents. This dataset emulates the structured reasoning process of human reviewers-summarizing the paper, referencing relevant works, identifying strengths and weaknesses, and generating a review conclusion. Building upon this, we train LLM reviewer agents capable of structured reasoning using a relevant-paper-aware training method. Furthermore, we construct ReviewAgents, a multi-role, multi-LLM agent review framework, to enhance the review comment generation process. Additionally, we propose ReviewBench, a benchmark for evaluating the review comments generated by LLMs. Our experimental results on ReviewBench demonstrate that while existing LLMs exhibit a certain degree of potential for automating the review process, there remains a gap when compared to human-generated reviews. Moreover, our ReviewAgents framework further narrows this gap, outperforming advanced LLMs in generating review comments.

Problem

Research questions and friction points this paper is trying to address.

Automating academic paper reviews to save time

Generating accurate and reasoning-consistent review comments

Bridging the gap between AI and human-generated reviews

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large language models for review generation

Introduces Review-CoT dataset for structured reasoning training

Develops ReviewAgents framework for multi-role review enhancement

🔎 Similar Papers

No similar papers found.

Authors to Follow