MemNovo: Look Back at the Spectrum for Balanced De Novo Peptide Sequencing from Mass Spectrometry

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing Transformer-based de novo peptide sequencing methods, which overly rely on the prior of already generated sequences during autoregressive decoding, leading to insufficient utilization of spectral evidence and biologically plausible yet spectrally inconsistent predictions. To mitigate this, the authors propose MemNovo—a training-free, plug-and-play inference mechanism that dynamically balances sequence priors and spectral evidence by constructing a persistent spectral memory bank and employing conservative residual connections. MemNovo is the first to identify and alleviate the stepwise decay of spectral information usage in the decoder, substantially restoring mutual information between decoding states and the original mass spectra. Evaluated on the Nine Species benchmark, MemNovo improves peptide accuracy of Casanovo and InstaNovo by up to 39.1% and 3.9% relative gains, respectively, with negligible computational overhead.
📝 Abstract
De novo peptide sequencing from tandem mass spectrometry is pivotal in proteomics, enabling identification of novel peptides without reference databases. While recent Transformer-based encoder-decoder models have achieved remarkable performance, we uncover a critical pathology in their inference dynamics. Through comprehensive feature scaling experiments, we demonstrate that existing auto-regressive peptide decoders tend to over-rely on generated-sequence priors while progressively under-utilizing fine-grained physical evidence from the input mass spectrum. This phenomenon leads to suboptimal results, where generated peptide sequences are biologically plausible yet not faithful to the input spectrum. To rectify this, we propose MemNovo, a training-free and plug-and-play mechanism that re-balances peptide and spectral contributions at inference time. MemNovo alleviates the information bottleneck by establishing a persistent spectral memory bank and injecting retrieved features directly into the final decoding stage via an ultra-conservative residual connection. Theoretical analysis confirms that this mechanism restores the mutual information between the decoder state and the raw spectrum. Extensive experiments on the Nine Species benchmark with two representative baselines, Casanovo and InstaNovo, demonstrate that MemNovo consistently improves both amino acid precision and peptide precision, achieving up to 39.1% relative improvement in peptide precision for Casanovo and up to 3.9% for InstaNovo, with negligible computational overhead.
Problem

Research questions and friction points this paper is trying to address.

de novo peptide sequencing
mass spectrometry
sequence-spectrum fidelity
auto-regressive decoding
spectral evidence
Innovation

Methods, ideas, or system contributions that make the work stand out.

de novo peptide sequencing
mass spectrometry
spectral memory
inference rebalancing
Transformer decoder
🔎 Similar Papers
No similar papers found.