π€ AI Summary
This study addresses the lack of systematic evaluation of mainstream Transformer modelsβ performance across different programming languages in multilingual software vulnerability detection. Leveraging the CVEFixes dataset, the authors conduct the first comprehensive comparison of BERT, RoBERTa, and CodeBERT on a binary vulnerability detection task for HTML, Python, JavaScript, and PHP, employing language-level three-fold stratified cross-validation. The results reveal significant performance disparities among the models across languages, highlighting the limited robustness of general-purpose Transformer architectures in multilingual settings. These findings underscore the need for modeling strategies that are more attuned to the syntactic and semantic characteristics of individual programming languages.
π Abstract
Software vulnerability detection is increasingly important as modern applications combine multiple programming languages. This paper presents an early comparative evaluation of BERT, RoBERTa, and CodeBERT for binary vulnerability detection across HTML, Python, JavaScript, and PHP using the CVEFixes dataset and language-wise three-fold stratified cross-validation. The results show clear performance differences across languages, indicating that multilingual vulnerability detection requires more language-aware and robust transformer-based modelling strategies.