SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities

📅 2026-01-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the scarcity of code review datasets aligned with real-world security assessment scenarios, which has hindered research on automated security-focused code review. To bridge this gap, the authors propose an active learning–driven ensemble classification approach, integrated with iterative human annotation, to efficiently identify and construct the first large-scale, multilingual dataset of security-relevant code reviews that closely mirrors real-world distributions. The resulting dataset comprises 6,732 high-quality security review samples, whose distribution has been statistically validated against actual practice. This resource not only reflects authentic security review patterns but also establishes a foundational benchmark and evaluation framework for future research on security-oriented code review generation.

Technology Category

Application Category

📝 Abstract

Software security vulnerabilities can lead to severe consequences, making early detection essential. Although code review serves as a critical defense mechanism against security flaws, relevant feedback remains scarce due to limited attention to security issues or a lack of expertise among reviewers. Existing datasets and studies primarily focus on general-purpose code review comments, either lacking security-specific annotations or being too limited in scale to support large-scale research. To bridge this gap, we introduce \textbf{SeRe}, a \textbf{security-related code review dataset}, constructed using an active learning-based ensemble classification approach. The proposed approach iteratively refines model predictions through human annotations, achieving high precision while maintaining reasonable recall. Using the fine-tuned ensemble classifier, we extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages. Statistical analysis indicates that SeRe generally \textbf{aligns with real-world security-related review distribution}. To assess both the utility of SeRe and the effectiveness of existing code review comment generation approaches, we benchmark state-of-the-art approaches on security-related feedback generation. By releasing SeRe along with our benchmark results, we aim to advance research in automated security-focused code review and contribute to the development of more effective secure software engineering practices.

Problem

Research questions and friction points this paper is trying to address.

security vulnerabilities

code review

dataset

security-related feedback

software security

Innovation

Methods, ideas, or system contributions that make the work stand out.

security-related code review

active learning

ensemble classification