Comparative Analysis of Formula and Structure Prediction from Tandem Mass Spectra

πŸ“… 2026-01-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

161K/year
πŸ€– AI Summary
Current spectral libraries exhibit limited coverage, leaving a substantial fraction of LC-MS/MS signals unannotated. To address this challenge, this study systematically evaluates multiple state-of-the-art molecular formula and structure prediction algorithms within a unified framework, leveraging standardized datasets and consistent evaluation metrics. For the first time, it provides a quantitatively comparable assessment of prediction performance across different adduct types. The work establishes realistic performance baselines under practical conditions, revealing critical bottlenecks and the current upper limits of achievable accuracy for existing methods. These insights offer a principled foundation for selecting appropriate computational tools and identifying key directions for methodological improvement in the identification of unknown compounds in metabolomics and exposomics research.

Technology Category

Application Category

πŸ“ Abstract
Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples. The results facilitate hypothesis-generating discovery of metabolic changes and disease mechanisms and provide information about environmental exposures and their effects on human health. Metabolomics and exposomics are made possible by the high resolving power of LC and high mass measurement accuracy of MS. However, a majority of the signals from such studies still cannot be identified or annotated using conventional library searching because existing spectral libraries are far from covering the vast chemical space captured by LC-MS/MS. To address this challenge and unleash the full potential of metabolomics and exposomics, a number of computational approaches have been developed to predict compounds based on tandem mass spectra. Published assessment of these approaches used different datasets and evaluation. To select prediction workflows for practical applications and identify areas for further improvements, we have carried out a systematic evaluation of the state-of-the-art prediction algorithms. Specifically, the accuracy of formula prediction and structure prediction was evaluated for different types of adducts. The resulting findings have established realistic performance baselines, identified critical bottlenecks, and provided guidance to further improve compound predictions based on MS.
Problem

Research questions and friction points this paper is trying to address.

metabolomics
exposomics
tandem mass spectrometry
compound identification
spectral library
Innovation

Methods, ideas, or system contributions that make the work stand out.

tandem mass spectrometry
formula prediction
structure prediction
metabolomics
computational evaluation
πŸ”Ž Similar Papers
πŸ’Ό Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
X
Xujun Che
University of North Carolina at Charlotte
X
Xiuxia Du
University of North Carolina at Charlotte
Depeng Xu
Depeng Xu
University of North Carolina at Charlotte
Machine LearningData PrivacyFairness