🤖 AI Summary
Systematic literature reviews (SLRs) suffer from low manual screening efficiency, high omission rates, and limitations inherent to keyword-based retrieval. To address these challenges, we propose LLMSurver—the first interactive, visualization-augmented framework specifically designed for the initial screening phase of SLRs. It integrates large language models (LLMs), prompt engineering, a multi-model voting consensus mechanism, and an interpretable result evaluation module. Through human-in-the-loop iterative querying and dynamic feedback, LLMSurver significantly improves screening precision and traceability. Evaluated on a real-world corpus of over 8,300 publications, it achieves >98.8% recall, reduces screening time from weeks to minutes, and surpasses human annotators in both accuracy and inter-annotator consistency. The framework supports mainstream open- and closed-source LLMs, is fully open-sourced, and enables full reproducibility—establishing a new paradigm for efficient, trustworthy, and interpretable SLR automation.
📝 Abstract
Systematic literature reviews (SLRs) are essential but labor-intensive due to high publication volumes and inefficient keyword-based filtering. To streamline this process, we evaluate Large Language Models (LLMs) for enhancing efficiency and accuracy in corpus filtration while minimizing manual effort. Our open-source tool LLMSurver presents a visual interface to utilize LLMs for literature filtration, evaluate the results, and refine queries in an interactive way. We assess the real-world performance of our approach in filtering over 8.3k articles during a recent survey construction, comparing results with human efforts. The findings show that recent LLM models can reduce filtering time from weeks to minutes. A consensus scheme ensures recall rates>98.8%, surpassing typical human error thresholds and improving selection accuracy. This work advances literature review methodologies and highlights the potential of responsible human-AI collaboration in academic research.