Early Comparative Evaluation of Transformer Models for Multilingual Software Vulnerability Detection

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the lack of systematic evaluation of mainstream Transformer models’ performance across different programming languages in multilingual software vulnerability detection. Leveraging the CVEFixes dataset, the authors conduct the first comprehensive comparison of BERT, RoBERTa, and CodeBERT on a binary vulnerability detection task for HTML, Python, JavaScript, and PHP, employing language-level three-fold stratified cross-validation. The results reveal significant performance disparities among the models across languages, highlighting the limited robustness of general-purpose Transformer architectures in multilingual settings. These findings underscore the need for modeling strategies that are more attuned to the syntactic and semantic characteristics of individual programming languages.

📝 Abstract

Software vulnerability detection is increasingly important as modern applications combine multiple programming languages. This paper presents an early comparative evaluation of BERT, RoBERTa, and CodeBERT for binary vulnerability detection across HTML, Python, JavaScript, and PHP using the CVEFixes dataset and language-wise three-fold stratified cross-validation. The results show clear performance differences across languages, indicating that multilingual vulnerability detection requires more language-aware and robust transformer-based modelling strategies.

Problem

Research questions and friction points this paper is trying to address.

software vulnerability detection

multilingual

transformer models

programming languages

CVEFixes

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual vulnerability detection

transformer models

comparative evaluation