Accurate de novo sequencing of the modified proteome with OmniNovo

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current proteomics methods face combinatorial explosion in database searching, limiting identification of uncharacterized or complex post-translational modifications (PTMs). To address this, we present the first reference-database-free, deep learning–driven de novo sequencing framework that jointly models peptide fragmentation patterns, enabling unbiased detection of all PTM types and generalization across modification sites. Our method integrates end-to-end spectrum interpretation, mass-constrained autoregressive decoding, and rigorously calibrated false discovery rate (FDR) estimation. At 1% FDR, it identifies 51% more modified peptides than standard database-search approaches and—critically—achieves accurate generalization to biologically relevant modification sites absent from training data. This capability uncovers previously inaccessible “dark matter” of the proteome, substantially expanding the scope of detectable PTMs beyond prior methodological limits.

Technology Category

Application Category

📝 Abstract

Post-translational modifications (PTMs) serve as a dynamic chemical language regulating protein function, yet current proteomic methods remain blind to a vast portion of the modified proteome. Standard database search algorithms suffer from a combinatorial explosion of search spaces, limiting the identification of uncharacterized or complex modifications. Here we introduce OmniNovo, a unified deep learning framework for reference-free sequencing of unmodified and modified peptides directly from tandem mass spectra. Unlike existing tools restricted to specific modification types, OmniNovo learns universal fragmentation rules to decipher diverse PTMs within a single coherent model. By integrating a mass-constrained decoding algorithm with rigorous false discovery rate estimation, OmniNovo achieves state-of-the-art accuracy, identifying 51% more peptides than standard approaches at a 1% false discovery rate. Crucially, the model generalizes to biological sites unseen during training, illuminating the dark matter of the proteome and enabling unbiased comprehensive analysis of cellular regulation.

Problem

Research questions and friction points this paper is trying to address.

Develops a deep learning framework for reference-free peptide sequencing

Addresses combinatorial search space explosion in PTM identification

Enables unbiased analysis of diverse post-translational modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniNovo uses deep learning for reference-free peptide sequencing.

It learns universal fragmentation rules for diverse modifications.

Integrates mass-constrained decoding with false discovery rate estimation.

🔎 Similar Papers

No similar papers found.

Authors to Follow