Accurate de novo sequencing of the modified proteome with OmniNovo

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current proteomics methods face combinatorial explosion in database searching, limiting identification of uncharacterized or complex post-translational modifications (PTMs). To address this, we present the first reference-database-free, deep learning–driven de novo sequencing framework that jointly models peptide fragmentation patterns, enabling unbiased detection of all PTM types and generalization across modification sites. Our method integrates end-to-end spectrum interpretation, mass-constrained autoregressive decoding, and rigorously calibrated false discovery rate (FDR) estimation. At 1% FDR, it identifies 51% more modified peptides than standard database-search approaches and—critically—achieves accurate generalization to biologically relevant modification sites absent from training data. This capability uncovers previously inaccessible “dark matter” of the proteome, substantially expanding the scope of detectable PTMs beyond prior methodological limits.

Technology Category

Application Category

📝 Abstract
Post-translational modifications (PTMs) serve as a dynamic chemical language regulating protein function, yet current proteomic methods remain blind to a vast portion of the modified proteome. Standard database search algorithms suffer from a combinatorial explosion of search spaces, limiting the identification of uncharacterized or complex modifications. Here we introduce OmniNovo, a unified deep learning framework for reference-free sequencing of unmodified and modified peptides directly from tandem mass spectra. Unlike existing tools restricted to specific modification types, OmniNovo learns universal fragmentation rules to decipher diverse PTMs within a single coherent model. By integrating a mass-constrained decoding algorithm with rigorous false discovery rate estimation, OmniNovo achieves state-of-the-art accuracy, identifying 51% more peptides than standard approaches at a 1% false discovery rate. Crucially, the model generalizes to biological sites unseen during training, illuminating the dark matter of the proteome and enabling unbiased comprehensive analysis of cellular regulation.
Problem

Research questions and friction points this paper is trying to address.

Develops a deep learning framework for reference-free peptide sequencing
Addresses combinatorial search space explosion in PTM identification
Enables unbiased analysis of diverse post-translational modifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniNovo uses deep learning for reference-free peptide sequencing.
It learns universal fragmentation rules for diverse modifications.
Integrates mass-constrained decoding with false discovery rate estimation.
🔎 Similar Papers
No similar papers found.
Y
Yuhan Chen
Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai, China.
Shang Qu
Shang Qu
Tsinghua University
AI4Bio
Z
Zhiqiang Gao
Shanghai Artificial Intelligence Laboratory, Shanghai, China.
Y
Yuejin Yang
Shanghai Artificial Intelligence Laboratory, Shanghai, China.; Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.
X
Xiang Zhang
Department of computer science, University of British Columbia, Vancouver, Canada.
S
Sheng Xu
Shanghai Artificial Intelligence Laboratory, Shanghai, China.; Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.
X
Xinjie Mao
Shanghai Artificial Intelligence Laboratory, Shanghai, China.; School of Medicine, Westlake University, Hangzhou, Zhejiang, China.; Shanghai Innovation Institute, Shanghai, China.
L
Liujia Qian
School of Medicine, Westlake University, Hangzhou, Zhejiang, China.
Jiaqi Wei
Jiaqi Wei
PhD student, Zhejiang University
NLPLLMAI for Science
Z
Zijie Qiu
Shanghai Artificial Intelligence Laboratory, Shanghai, China.
Chenyu You
Chenyu You
Assistant Professor, Stony Brook University
Machine LearningAI for HealthComputer VisionMedical Image AnalysisMultimedia
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
N
Ning Ding
Shanghai Artificial Intelligence Laboratory, Shanghai, China.; Department of Electronic Engineering, Tsinghua University, Beijing, China.
Tiannan Guo
Tiannan Guo
Guomics, Westlake University
proteomicsAImedicine
B
Bowen Zhou
Shanghai Artificial Intelligence Laboratory, Shanghai, China.; Department of Electronic Engineering, Tsinghua University, Beijing, China.
S
Siqi Sun
Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.