Condor: A Code Discriminator Integrating General Semantics with Code Details

📅 2024-12-23

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

To address the challenge of accurately identifying correct implementations among multiple code candidates generated by large language models (LLMs), this paper proposes a fine-grained, highly flexible non-execution-based code discriminator. Methodologically, it introduces a novel integration of contrastive learning with intermediate-state modeling of code edits, constructing semantically progressive code pairs to enrich training data and designing a Transformer-based discriminator to enable fine-grained representation alignment. This framework overcomes the limitation of conventional discriminators in capturing subtle semantic distinctions. Experiments demonstrate significant improvements: +6.0 percentage points in F1 score on CodeNanoFix; Pass@1 of 62.63% (+10.0%) on HumanEval using Llama-3.1 (70B); and a 147.05% gain in Pass@1 on APPS—validating its strong generalization capability and practical utility.

Technology Category

Application Category

📝 Abstract

LLMs demonstrate significant potential across various software engineering tasks. However, they still face challenges in generating correct code on the first attempt when addressing complex requirements. Introducing a discriminator to select reliable outputs from multiple generated results is an effective way to enhance their reliability and stability. Currently, these discriminators fall into two categories: execution-based discriminators and non-execution-based discriminators. Execution-based discriminators face flexibility challenges due to difficulties in obtaining test cases and security concerns, while non-execution-based discriminators, although more flexible, struggle to capture subtle differences in code details. To maintain flexibility while improving the model's ability to capture fine-grained code details, this paper proposes Condor. We first design contrastive learning to optimize the code representations of the base model, enabling it to reflect differences in code details. Then, we leverage intermediate data from the code modification process to further enrich the discriminator's training data, enhancing its ability to discern code details. Experimental results indicate that on the subtle code difference dataset (i.e., CodeNanoFix), Condor significantly outperforms other discriminators in discriminative performance: Condor (1.3B) improves the discriminative F1 score of DeepSeek-Coder (1.3B) from 67% to 73%. In discriminating LLM-generated outputs, Condor (1.3B) and Condor (110M) raise the Pass@1 score of Meta-Llama-3.1-Instruct (70B) on the CodeNanoFix dataset from 52.64% to 62.63% and 59.64%, respectively. Moreover, Condor demonstrates strong generalization capabilities on the MBPP and APPS datasets. For example, Condor (1.3B) improves the Pass@1 of Meta-Llama-3.1-Instruct (70B) on the APPS dataset by 147.05%.

Problem

Research questions and friction points this paper is trying to address.

Enhances LLM code reliability by selecting correct outputs

Improves discriminator flexibility while capturing fine-grained code details

Optimizes code representations using contrastive learning and intermediate data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning optimizes code representation for detail differences.

Intermediate code modification data enriches discriminator training.

Condor discriminator enhances LLM code generation accuracy significantly.

🔎 Similar Papers

No similar papers found.

Authors to Follow