CodeChemist: Functional Knowledge Transfer for Low-Resource Code Generation via Test-Time Scaling

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the poor code generation performance of low-resource programming languages caused by insufficient training data, this paper proposes CodeChemist—a novel framework that integrates test-time expansion with cross-lingual functional knowledge transfer, requiring no fine-tuning. Specifically, CodeChemist automatically generates and executes test cases in high-resource languages (e.g., Python), employs multi-temperature hedged sampling to produce diverse candidate programs, and performs cross-lingual filtering and re-ranking based on test pass rates to achieve precise functional semantic transfer. Experimental results demonstrate that CodeChemist significantly outperforms existing test-time expansion methods across multiple low-resource languages—including Rust, Go, and JavaScript—achieving an average 12.7% improvement in pass@1 on the HumanEval-X benchmark. This validates both its effectiveness and strong generalization capability across programming languages.

Technology Category

Application Category

📝 Abstract

Code Large Language Models (CodeLLMs) are increasingly used in code generation tasks across a wide range of applications. However, their performance is often inconsistent across different programming languages (PLs), with low-resource PLs suffering the most due to limited training data. In this paper, we present CodeChemist, a novel and efficient framework for test-time scaling that enables functional knowledge transfer from high-resource to low-resource PLs using generated test cases. CodeChemist first generates and executes code in high-resource PLs to create test cases that encapsulate functional knowledge. It then uses multi-temperature hedged sampling to generate code snippets in the low-resource PL and selects the best one based on the pass rate of the test cases. Our extensive experiments show that CodeChemist outperforms existing test-time scaling approaches, boosting the performance of code generation for low-resource PLs without requiring any model retraining.

Problem

Research questions and friction points this paper is trying to address.

Transferring functional knowledge from high-resource to low-resource programming languages

Improving code generation consistency across different programming languages

Enhancing low-resource language performance without model retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfers functional knowledge via test cases

Uses multi-temperature hedged sampling technique

Selects best code via test case pass rate

🔎 Similar Papers

No similar papers found.

Authors to Follow