ProDCARL: Reinforcement Learning-Aligned Diffusion Models for De Novo Antimicrobial Peptide Design

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of explicitly optimizing both high antimicrobial activity and low toxicity in antimicrobial peptide (AMP) design—a limitation of conventional generative models—by introducing, for the first time, a reinforcement learning alignment mechanism into a diffusion model. Building upon the EvoDiff OA-DM 38M architecture, the proposed framework employs AMP activity and toxicity predictors to guide sequence generation, integrating top-k policy gradients, entropy regularization, and early stopping to enhance performance while preserving sequence diversity. Experimental results demonstrate a significant improvement in generated peptides, with the average AMP score increasing from 0.081 to 0.178. Notably, 6.3% of candidates exhibit high quality (pAMP > 0.7 and pTox < 0.3), and sequence diversity remains high at 0.929, effectively balancing potency, safety, and novelty.

Technology Category

Application Category

📝 Abstract

Antimicrobial resistance threatens healthcare sustainability and motivates low-cost computational discovery of antimicrobial peptides (AMPs). De novo peptide generation must optimize antimicrobial activity and safety through low predicted toxicity, but likelihood-trained generators do not enforce these goals explicitly. We introduce ProDCARL, a reinforcement-learning alignment framework that couples a diffusion-based protein generator (EvoDiff OA-DM 38M) with sequence property predictors for AMP activity and peptide toxicity. We fine-tune the diffusion prior on AMP sequences to obtain a domain-aware generator. Top-k policy-gradient updates use classifier-derived rewards plus entropy regularization and early stopping to preserve diversity and reduce reward hacking. In silico experiments show ProDCARL increases the mean predicted AMP score from 0.081 after fine-tuning to 0.178. The joint high-quality hit rate reaches 6.3\% with pAMP $>$0.7 and pTox $<$0.3. ProDCARL maintains high diversity, with $1-$mean pairwise identity equal to 0.929. Qualitative analyses with AlphaFold3 and ProtBERT embeddings suggest candidates show plausible AMP-like structural and semantic characteristics. ProDCARL serves as a candidate generator that narrows experimental search space, and experimental validation remains future work.

Problem

Research questions and friction points this paper is trying to address.

antimicrobial peptides

de novo design

toxicity prediction

antimicrobial activity

computational discovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models

reinforcement learning alignment

de novo peptide design

antimicrobial peptides

reward hacking mitigation

🔎 Similar Papers

No similar papers found.

Authors to Follow