BACE: LLM-based Code Generation through Bayesian Anchored Co-Evolution of Code and Test Populations

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the vulnerability of large language models (LLMs) in code generation to erroneous or trivial auto-generated tests, which can misjudge valid solutions or induce performance degradation. To mitigate semantic drift in self-verification loops, the study introduces Bayesian inference into a code–test co-evolution framework for the first time. By modeling belief distributions over noisy interaction evidence and anchoring updates with a small set of public examples, the method recursively refines the joint evolutionary trajectory of code and test populations. Integrating multi-agent co-evolution with test-driven generation, the approach significantly outperforms existing methods on the LiveCodeBench v6 benchmark (post-March 2025) and demonstrates effectiveness across both open- and closed-source small-scale language models.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in code generation. While an interactive feedback loop can improve performance, writing effective tests is a non-trivial task. Early multi-agent frameworks, such as AgentCoder, automated this process but relied on generated tests as absolute ground truth. This approach is fragile: incorrect code frequently passes faulty or trivial tests, while valid solutions are often degraded to satisfy incorrect assertions. Addressing this limitation, newer methods have largely abandoned test generation in favor of planning and reasoning based on examples. We argue, however, that generated tests remain a valuable signal if we model them as noisy sensors guided by bayesian updates. To this end, we introduce BACE (Bayesian Anchored Co-Evolution), a framework that reformulates synthesis as a Bayesian co-evolutionary process where code and test populations are evolved, guided by belief distributions that are reciprocally updated based on noisy interaction evidence. By anchoring this search on minimal public examples, BACE prevents the co-evolutionary drift typical of self-validating loops. Extensive evaluations on LiveCodeBench v6 (post-March 2025) reveal that BACE achieves superior performance across both proprietary models and open-weight small language models.

Problem

Research questions and friction points this paper is trying to address.

code generation

test generation

Bayesian inference

co-evolution

LLM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian co-evolution

code generation

test generation