🤖 AI Summary
This survey addresses core challenges in code generation for low-resource programming languages (LRPLs) and domain-specific languages (DSLs): severe data scarcity, pronounced syntactic specificity, and inadequate coverage by general-purpose pretraining corpora. We systematically analyze 111 studies published between 2020 and 2024. Methodologically, we propose the first dedicated survey framework for LRPLs/DSLs, categorizing evaluation techniques into four types, performance-enhancement methods into six classes, and identifying emerging adaptation architectures; we further pinpoint the critical absence of standardized benchmarks. Through bibliometric analysis, cross-lingual capability assessment, dataset strategy dissection, and comparative evaluation using multidimensional quality metrics—including CodeBLEU and functional correctness—we empirically delineate the capabilities and limitations of mainstream models (e.g., Codex, CodeLlama). Our contributions include a reusable methodological taxonomy and a practical guideline, establishing foundational support for standardization and future research in LRPL/DSL code generation.
📝 Abstract
Large Language Models (LLMs) have shown impressive capabilities in code generation for popular programming languages. However, their performance on Low-Resource Programming Languages (LRPLs) and Domain-Specific Languages (DSLs) remains a significant challenge, affecting millions of developers-3.5 million users in Rust alone-who cannot fully utilize LLM capabilities. LRPLs and DSLs encounter unique obstacles, including data scarcity and, for DSLs, specialized syntax that is poorly represented in general-purpose datasets. Addressing these challenges is crucial, as LRPLs and DSLs enhance development efficiency in specialized domains, such as finance and science. While several surveys discuss LLMs in software engineering, none focus specifically on the challenges and opportunities associated with LRPLs and DSLs. Our survey fills this gap by systematically reviewing the current state, methodologies, and challenges in leveraging LLMs for code generation in these languages. We filtered 111 papers from over 27,000 published studies between 2020 and 2024 to evaluate the capabilities and limitations of LLMs in LRPLs and DSLs. We report the LLMs used, benchmarks, and metrics for evaluation, strategies for enhancing performance, and methods for dataset collection and curation. We identified four main evaluation techniques and several metrics for assessing code generation in LRPLs and DSLs. Our analysis categorizes improvement methods into six groups and summarizes novel architectures proposed by researchers. Despite various techniques and metrics, a standard approach and benchmark dataset for evaluating code generation in LRPLs and DSLs are lacking. This survey serves as a resource for researchers and practitioners at the intersection of LLMs, software engineering, and specialized programming languages, laying the groundwork for future advancements in code generation for LRPLs and DSLs.