🤖 AI Summary
Current proteomics methods face combinatorial explosion in database searching, limiting identification of uncharacterized or complex post-translational modifications (PTMs). To address this, we present the first reference-database-free, deep learning–driven de novo sequencing framework that jointly models peptide fragmentation patterns, enabling unbiased detection of all PTM types and generalization across modification sites. Our method integrates end-to-end spectrum interpretation, mass-constrained autoregressive decoding, and rigorously calibrated false discovery rate (FDR) estimation. At 1% FDR, it identifies 51% more modified peptides than standard database-search approaches and—critically—achieves accurate generalization to biologically relevant modification sites absent from training data. This capability uncovers previously inaccessible “dark matter” of the proteome, substantially expanding the scope of detectable PTMs beyond prior methodological limits.
📝 Abstract
Post-translational modifications (PTMs) serve as a dynamic chemical language regulating protein function, yet current proteomic methods remain blind to a vast portion of the modified proteome. Standard database search algorithms suffer from a combinatorial explosion of search spaces, limiting the identification of uncharacterized or complex modifications. Here we introduce OmniNovo, a unified deep learning framework for reference-free sequencing of unmodified and modified peptides directly from tandem mass spectra. Unlike existing tools restricted to specific modification types, OmniNovo learns universal fragmentation rules to decipher diverse PTMs within a single coherent model. By integrating a mass-constrained decoding algorithm with rigorous false discovery rate estimation, OmniNovo achieves state-of-the-art accuracy, identifying 51% more peptides than standard approaches at a 1% false discovery rate. Crucially, the model generalizes to biological sites unseen during training, illuminating the dark matter of the proteome and enabling unbiased comprehensive analysis of cellular regulation.