🤖 AI Summary
This work addresses the vulnerability of code generation models to producing insecure code and their susceptibility to adversarial prompts. To mitigate these risks, we propose a two-stage safety-enhancement framework: first, we introduce the first post-training framework tailored for secure code reasoning, integrating rule-guided red-teaming (to generate high-coverage unsafe prompts) with multi-objective reinforcement learning; second, we jointly optimize safety defense and generation utility to substantially reduce false rejection rates on legitimate queries. The resulting PurpCode-32B model achieves state-of-the-art performance in cybersecurity safety, significantly outperforming mainstream baselines, while preserving strong code generation quality and robust understanding of security-related knowledge.
📝 Abstract
We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Meanwhile, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.