Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the tendency of large language models to inadvertently reproduce subtle security vulnerabilities present in their training data during code generation, a challenge inadequately mitigated by existing alignment techniques that struggle to correct localized errors. To overcome this limitation, the authors propose Tree-based Self-Play (TSP), a novel mechanism that frames secure code generation as a fine-grained sequential decision-making process. TSP leverages self-play to simultaneously generate both secure code paths and vulnerable variants, enabling on-the-fly self-correction at critical decision nodes. This approach yields dense, online policy learning signals that encourage the model to internalize language-agnostic, abstract security principles. Empirical results demonstrate that TSP boosts the pass@1 accuracy of CodeLlama-7B to 75.8% on a Python security benchmark—substantially outperforming supervised fine-tuning (57.0%)—reduces vulnerabilities by 24.5% on previously unseen flaw categories, and successfully transfers security capabilities across programming languages.

📝 Abstract

While Large Language Models (LLMs) excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), typically apply coarse-grained optimization at the sequence level. This approach often fails to address the localized nature of security flaws, where a single incorrect token choice can compromise an entire program. To bridge this gap, we introduce Tree-like Self-Play (TSP), a framework that reframes secure code generation as a fine-grained sequential decision process. Unlike standard methods that blindly maximize likelihood, TSP constructs a decision tree where the model explores branching trajectories--generating both secure "golden paths" and vulnerable variants. By treating code generation as a self-play game, the model learns to strictly discriminate against its own localized errors. This provides a dense, on-policy learning signal that forces self-correction precisely at the critical decision nodes where vulnerabilities typically emerge. Our experiments demonstrate that TSP fundamentally enhances model reliability. In Python security benchmarks, TSP boosts CodeLlama-7B's pass rate (SPR@1) to 75.8%, significantly outperforming SFT (57.0%) and unstructured self-play baselines. Crucially, TSP induces robust out-of-distribution generalization: the model not only reduces vulnerabilities in unseen categories (CWEs) by 24.5% but also successfully transfers security principles learned from C/C++ to diverse languages, including Python, Go, and JavaScript. This suggests that TSP does not merely memorize patches, but internalizes abstract, language-agnostic security logic.

Problem

Research questions and friction points this paper is trying to address.

code generation

security vulnerabilities

fine-grained optimization

language models

secure coding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-like Self-Play

Secure Code Generation

Fine-grained Decision Process