LoRACode: LoRA Adapters for Code Embeddings

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing code embedding models suffer from inadequate fine-grained syntactic and contextual modeling; open-source alternatives (e.g., CodeBERT, UniXcoder) exhibit poor scalability, while proprietary models incur prohibitive computational costs. To address these limitations, we propose a task- and language-decoupled LoRA adapter framework—the first to enable parameter-efficient fine-tuning along both *task type* (Code2Code vs. Text2Code) and *programming language* dimensions, introducing fewer than 2% additional parameters. The framework is trained end-to-end on a 2-million-sample multilingual code corpus and achieves full adapter adaptation within 25 minutes on two H100 GPUs. Experimental results demonstrate substantial improvements: +9.1% MRR in Code2Code retrieval and up to +86.69% accuracy gain in Text2Code tasks. Our approach significantly enhances semantic retrieval precision while drastically improving deployment efficiency and model adaptability across diverse programming languages and downstream tasks.

Technology Category

Application Category

📝 Abstract

Code embeddings are essential for semantic code search; however, current approaches often struggle to capture the precise syntactic and contextual nuances inherent in code. Open-source models such as CodeBERT and UniXcoder exhibit limitations in scalability and efficiency, while high-performing proprietary systems impose substantial computational costs. We introduce a parameter-efficient fine-tuning method based on Low-Rank Adaptation (LoRA) to construct task-specific adapters for code retrieval. Our approach reduces the number of trainable parameters to less than two percent of the base model, enabling rapid fine-tuning on extensive code corpora (2 million samples in 25 minutes on two H100 GPUs). Experiments demonstrate an increase of up to 9.1% in Mean Reciprocal Rank (MRR) for Code2Code search, and up to 86.69% for Text2Code search tasks across multiple programming languages. Distinction in task-wise and language-wise adaptation helps explore the sensitivity of code retrieval for syntactical and linguistic variations.

Problem

Research questions and friction points this paper is trying to address.

Enhances code embeddings for semantic code search accuracy.

Reduces computational costs and improves scalability in code retrieval.

Adapts to syntactical and linguistic variations across programming languages.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA-based parameter-efficient fine-tuning method

Reduces trainable parameters to less than 2%

Improves Code2Code and Text2Code search performance

🔎 Similar Papers

No similar papers found.

Authors to Follow