Early Comparative Evaluation of Transformer Models for Multilingual Software Vulnerability Detection

πŸ“… 2026-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the lack of systematic evaluation of mainstream Transformer models’ performance across different programming languages in multilingual software vulnerability detection. Leveraging the CVEFixes dataset, the authors conduct the first comprehensive comparison of BERT, RoBERTa, and CodeBERT on a binary vulnerability detection task for HTML, Python, JavaScript, and PHP, employing language-level three-fold stratified cross-validation. The results reveal significant performance disparities among the models across languages, highlighting the limited robustness of general-purpose Transformer architectures in multilingual settings. These findings underscore the need for modeling strategies that are more attuned to the syntactic and semantic characteristics of individual programming languages.
πŸ“ Abstract
Software vulnerability detection is increasingly important as modern applications combine multiple programming languages. This paper presents an early comparative evaluation of BERT, RoBERTa, and CodeBERT for binary vulnerability detection across HTML, Python, JavaScript, and PHP using the CVEFixes dataset and language-wise three-fold stratified cross-validation. The results show clear performance differences across languages, indicating that multilingual vulnerability detection requires more language-aware and robust transformer-based modelling strategies.
Problem

Research questions and friction points this paper is trying to address.

software vulnerability detection
multilingual
transformer models
programming languages
CVEFixes
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual vulnerability detection
transformer models
comparative evaluation
language-aware modeling
CVEFixes dataset
πŸ”Ž Similar Papers
No similar papers found.