ProDCARL: Reinforcement Learning-Aligned Diffusion Models for De Novo Antimicrobial Peptide Design

๐Ÿ“… 2026-01-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of explicitly optimizing both high antimicrobial activity and low toxicity in antimicrobial peptide (AMP) designโ€”a limitation of conventional generative modelsโ€”by introducing, for the first time, a reinforcement learning alignment mechanism into a diffusion model. Building upon the EvoDiff OA-DM 38M architecture, the proposed framework employs AMP activity and toxicity predictors to guide sequence generation, integrating top-k policy gradients, entropy regularization, and early stopping to enhance performance while preserving sequence diversity. Experimental results demonstrate a significant improvement in generated peptides, with the average AMP score increasing from 0.081 to 0.178. Notably, 6.3% of candidates exhibit high quality (pAMP > 0.7 and pTox < 0.3), and sequence diversity remains high at 0.929, effectively balancing potency, safety, and novelty.

Technology Category

Application Category

๐Ÿ“ Abstract
Antimicrobial resistance threatens healthcare sustainability and motivates low-cost computational discovery of antimicrobial peptides (AMPs). De novo peptide generation must optimize antimicrobial activity and safety through low predicted toxicity, but likelihood-trained generators do not enforce these goals explicitly. We introduce ProDCARL, a reinforcement-learning alignment framework that couples a diffusion-based protein generator (EvoDiff OA-DM 38M) with sequence property predictors for AMP activity and peptide toxicity. We fine-tune the diffusion prior on AMP sequences to obtain a domain-aware generator. Top-k policy-gradient updates use classifier-derived rewards plus entropy regularization and early stopping to preserve diversity and reduce reward hacking. In silico experiments show ProDCARL increases the mean predicted AMP score from 0.081 after fine-tuning to 0.178. The joint high-quality hit rate reaches 6.3\% with pAMP $>$0.7 and pTox $<$0.3. ProDCARL maintains high diversity, with $1-$mean pairwise identity equal to 0.929. Qualitative analyses with AlphaFold3 and ProtBERT embeddings suggest candidates show plausible AMP-like structural and semantic characteristics. ProDCARL serves as a candidate generator that narrows experimental search space, and experimental validation remains future work.
Problem

Research questions and friction points this paper is trying to address.

antimicrobial peptides
de novo design
toxicity prediction
antimicrobial activity
computational discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models
reinforcement learning alignment
de novo peptide design
antimicrobial peptides
reward hacking mitigation
๐Ÿ”Ž Similar Papers
No similar papers found.