Property-Isometric Variational Autoencoders for Sequence Modeling and Design

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Biological sequence design requires optimizing high-dimensional continuous functional properties—such as fluorescence spectra, photostability, or antimicrobial activity—of DNA, RNA, or peptides. However, existing models are limited to binary labels and fail to capture the complex geometry of such property manifolds. To address this, we propose PrIVAE: a graph neural network–based variational autoencoder incorporating isometric regularization and nearest-neighbor graph constraints to rigorously preserve the intrinsic geometric structure of the property manifold in latent space. PrIVAE enables differentiable inverse design for high-dimensional continuous properties—the first method to achieve this capability. It maintains high sequence reconstruction fidelity while substantially improving the generation efficiency of rare functional sequences (e.g., DNA nanoclusters with target emission wavelengths). Wet-lab validation demonstrates a 16.1× enrichment in desired variants, establishing a new paradigm for function-driven, rational biological sequence design.

Technology Category

Application Category

📝 Abstract
Biological sequence design (DNA, RNA, or peptides) with desired functional properties has applications in discovering novel nanomaterials, biosensors, antimicrobial drugs, and beyond. One common challenge is the ability to optimize complex high-dimensional properties such as target emission spectra of DNA-mediated fluorescent nanoparticles, photo and chemical stability, and antimicrobial activity of peptides across target microbes. Existing models rely on simple binary labels (e.g., binding/non-binding) rather than high-dimensional complex properties. To address this gap, we propose a geometry-preserving variational autoencoder framework, called PrIVAE, which learns latent sequence embeddings that respect the geometry of their property space. Specifically, we model the property space as a high-dimensional manifold that can be locally approximated by a nearest neighbor graph, given an appropriately defined distance measure. We employ the property graph to guide the sequence latent representations using (1) graph neural network encoder layers and (2) an isometric regularizer. PrIVAE learns a property-organized latent space that enables rational design of new sequences with desired properties by employing the trained decoder. We evaluate the utility of our framework for two generative tasks: (1) design of DNA sequences that template fluorescent metal nanoclusters and (2) design of antimicrobial peptides. The trained models retain high reconstruction accuracy while organizing the latent space according to properties. Beyond in silico experiments, we also employ sampled sequences for wet lab design of DNA nanoclusters, resulting in up to 16.1-fold enrichment of rare-property nanoclusters compared to their abundance in training data, demonstrating the practical utility of our framework.
Problem

Research questions and friction points this paper is trying to address.

Optimizing high-dimensional functional properties in biological sequences
Learning geometry-preserving latent embeddings for sequence design
Enabling rational design of DNA and peptides with desired properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-preserving variational autoencoder framework
Graph neural network encoder layers
Isometric regularizer for property-organized latent space
🔎 Similar Papers
No similar papers found.
E
Elham Sadeghi
Department of Computer Science, University at Albany- SUNY, Albany, NY, USA
X
Xianqi Deng
Department of Computer Science, University at Albany- SUNY, Albany, NY, USA
I
I-Hsin Lin
University of California- Irvine, Irvine, CA, USA
S
Stacy M. Copp
University of California- Irvine, Irvine, CA, USA
Petko Bogdanov
Petko Bogdanov
University at Albany-SUNY
Data miningData ScienceMaterials InformaticsWireless Networks