Exploring different approaches to customize language models for domain-specific text-to-code generation

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that general-purpose large language models often struggle to accurately invoke library functions and adhere to domain-specific conventions when generating code for specialized frameworks such as Scikit-learn and OpenCV. To systematically evaluate customization strategies, the authors construct a synthetic programming dataset spanning general Python, Scikit-learn, and OpenCV, and assess three approaches—few-shot prompting, retrieval-augmented generation (RAG), and LoRA fine-tuning—within a unified framework. The study presents the first comparative analysis of prompt engineering versus parameter-efficient fine-tuning in domain-specific code generation, revealing practical trade-offs among accuracy, cost, and flexibility. Experimental results demonstrate that LoRA fine-tuning significantly outperforms prompt-based methods in both accuracy and domain alignment, whereas few-shot prompting and RAG, while improving relevance, offer limited gains in overall correctness.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated strong capabilities in generating executable code from natural language descriptions. However, general-purpose models often struggle in specialized programming contexts where domain-specific libraries, APIs, or conventions must be used. Customizing smaller open-source models offers a cost-effective alternative to relying on large proprietary systems. In this work, we investigate how smaller language models can be adapted for domain-specific code generation using synthetic datasets. We construct datasets of programming exercises across three domains within the Python ecosystem: general Python programming, Scikit-learn machine learning workflows, and OpenCV-based computer vision tasks. Using these datasets, we evaluate three customization strategies: few-shot prompting, retrieval-augmented generation (RAG), and parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA). Performance is evaluated using both benchmark-based metrics and similarity-based metrics that measure alignment with domain-specific code. Our results show that prompting-based approaches such as few-shot learning and RAG can improve domain relevance in a cost-effective manner, although their impact on benchmark accuracy is limited. In contrast, LoRA-based fine-tuning consistently achieves higher accuracy and stronger domain alignment across most tasks. These findings highlight practical trade-offs between flexibility, computational cost, and performance when adapting smaller language models for specialized programming tasks.
Problem

Research questions and friction points this paper is trying to address.

domain-specific code generation
language model customization
text-to-code
specialized programming contexts
executable code generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-specific code generation
Low-Rank Adaptation (LoRA)
retrieval-augmented generation (RAG)
synthetic dataset
parameter-efficient fine-tuning
🔎 Similar Papers
No similar papers found.