Comparative Analysis of Formula and Structure Prediction from Tandem Mass Spectra

📅 2026-01-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current spectral libraries exhibit limited coverage, leaving a substantial fraction of LC-MS/MS signals unannotated. To address this challenge, this study systematically evaluates multiple state-of-the-art molecular formula and structure prediction algorithms within a unified framework, leveraging standardized datasets and consistent evaluation metrics. For the first time, it provides a quantitatively comparable assessment of prediction performance across different adduct types. The work establishes realistic performance baselines under practical conditions, revealing critical bottlenecks and the current upper limits of achievable accuracy for existing methods. These insights offer a principled foundation for selecting appropriate computational tools and identifying key directions for methodological improvement in the identification of unknown compounds in metabolomics and exposomics research.

Technology Category

Application Category

📝 Abstract
Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples. The results facilitate hypothesis-generating discovery of metabolic changes and disease mechanisms and provide information about environmental exposures and their effects on human health. Metabolomics and exposomics are made possible by the high resolving power of LC and high mass measurement accuracy of MS. However, a majority of the signals from such studies still cannot be identified or annotated using conventional library searching because existing spectral libraries are far from covering the vast chemical space captured by LC-MS/MS. To address this challenge and unleash the full potential of metabolomics and exposomics, a number of computational approaches have been developed to predict compounds based on tandem mass spectra. Published assessment of these approaches used different datasets and evaluation. To select prediction workflows for practical applications and identify areas for further improvements, we have carried out a systematic evaluation of the state-of-the-art prediction algorithms. Specifically, the accuracy of formula prediction and structure prediction was evaluated for different types of adducts. The resulting findings have established realistic performance baselines, identified critical bottlenecks, and provided guidance to further improve compound predictions based on MS.
Problem

Research questions and friction points this paper is trying to address.

metabolomics
exposomics
tandem mass spectrometry
compound identification
spectral library
Innovation

Methods, ideas, or system contributions that make the work stand out.

tandem mass spectrometry
formula prediction
structure prediction
metabolomics
computational evaluation
🔎 Similar Papers
No similar papers found.
X
Xujun Che
University of North Carolina at Charlotte
X
Xiuxia Du
University of North Carolina at Charlotte
Depeng Xu
Depeng Xu
University of North Carolina at Charlotte
Machine LearningData PrivacyFairness