Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Keyword-based retrieval in systematic literature reviews (SLRs) incurs high manual screening effort and low precision. Method: This paper proposes a semi-automated screening framework driven by multi-large language model (LLM) consensus. It integrates state-of-the-art open-source and commercial LLMs (2024–2025), employs descriptive prompting for paper classification, generates initial labels via a weighted consensus mechanism, and incorporates human-in-the-loop supervision with real-time correction. A visual interactive tool, LLMSurver, enables human-AI collaborative decision-making. Results: Evaluated on over 8,000 real candidate papers, the framework substantially reduces manual screening workload, achieves lower error rates than individual human experts, and demonstrates that modern open-source LLMs deliver sufficient performance—offering high accuracy, strong interpretability, low cost, and broad applicability.

Technology Category

Application Category

📝 Abstract
The creation of systematic literature reviews (SLR) is critical for analyzing the landscape of a research field and guiding future research directions. However, retrieving and filtering the literature corpus for an SLR is highly time-consuming and requires extensive manual effort, as keyword-based searches in digital libraries often return numerous irrelevant publications. In this work, we propose a pipeline leveraging multiple large language models (LLMs), classifying papers based on descriptive prompts and deciding jointly using a consensus scheme. The entire process is human-supervised and interactively controlled via our open-source visual analytics web interface, LLMSurver, which enables real-time inspection and modification of model outputs. We evaluate our approach using ground-truth data from a recent SLR comprising over 8,000 candidate papers, benchmarking both open and commercial state-of-the-art LLMs from mid-2024 and fall 2025. Results demonstrate that our pipeline significantly reduces manual effort while achieving lower error rates than single human annotators. Furthermore, modern open-source models prove sufficient for this task, making the method accessible and cost-effective. Overall, our work demonstrates how responsible human-AI collaboration can accelerate and enhance systematic literature reviews within academic workflows.
Problem

Research questions and friction points this paper is trying to address.

Automating literature corpus filtration in systematic reviews
Reducing manual effort in paper classification using LLMs
Enhancing SLR efficiency through human-supervised AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging multiple LLMs for paper classification
Using consensus scheme for joint decision-making
Human-supervised interactive visual analytics interface
🔎 Similar Papers
No similar papers found.
L
Lucas Joos
University of Konstanz, Konstanz, Germany
D
Daniel A. Keim
University of Konstanz, Konstanz, Germany
Maximilian T. Fischer
Maximilian T. Fischer
Postdoctoral Research Fellow, University of Konstanz
Visual Data AnalysisInteractive Data AnalysisVisual AnalyticsCommunication Analysis