PrivCode++: Latent-Conditioned Differentially Private Code Generation for Comprehensive Guarantees

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing differentially private code generation methods protect only the output code, neglecting sensitive information in the prompts, which substantially degrades generation quality and diversity. This work proposes the first framework that provides differential privacy guarantees for both prompts and generated code snippets. It introduces an innovative privacy-agnostic implicit conditioning mechanism and employs a two-stage training strategy to enable efficient fine-tuning without accessing the original sensitive data. Evaluated across multiple benchmarks, the proposed method significantly outperforms existing baselines, achieving generation utility close to that of non-private or weakly private approaches while offering stronger end-to-end privacy guarantees.

📝 Abstract

Large language models fine-tuned on instruction-code pairs may memorize and subsequently leak sensitive training data. Existing differentially private (DP) code generation methods primarily protect code snippets while assuming prompts are public, which fails in realistic scenarios where prompts may also contain sensitive information. When prompts cannot be explicitly learned or used during generation, code synthesis suffers from severe utility degradation as well as reduced diversity and fidelity. To address these challenges, we propose PrivCode-Plus, the first work to explore DP code generation where both prompts and code snippets are considered sensitive in LLM fine-tuning. PrivCode-Plus introduces a two-stage DP framework with a Privacy-Free Latent Conditioning module, enabling effective DP fine-tuning and data synthesis without direct access to sensitive prompts or code. Extensive experiments show that PrivCode-Plus achieves substantially higher utility than baselines, remains competitive with the method with relaxing privacy assumptions, and provides stronger privacy guarantees.

Problem

Research questions and friction points this paper is trying to address.

differentially private code generation

sensitive prompts

code synthesis

privacy guarantees

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

differentially private code generation

latent conditioning

privacy-preserving LLM fine-tuning