🤖 AI Summary
De novo elucidation of molecular structures from NMR spectra is highly challenging due to spectral complexity and the vastness of chemical space. To address this, we propose DiffNMR—the first end-to-end framework based on a conditional discrete diffusion model that iteratively optimizes molecular graphs to ensure globally consistent structure generation. Our method introduces a novel two-stage pretraining strategy: (i) a diffusion-based autoencoder (Diff-AE) for structural regularization, and (ii) spectrum–molecule contrastive alignment. Additionally, we incorporate retrieval-based initialization, similarity-aware filtering, and a radial basis function (RBF)-based NMR encoder to accurately capture the continuity of chemical shifts and their underlying chemical relationships. On standard benchmarks, DiffNMR significantly outperforms conventional autoregressive approaches, achieving substantial gains in elucidation accuracy while maintaining computational efficiency and robustness—establishing a new paradigm for NMR-driven structural determination.
📝 Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is a central characterization method for molecular structure elucidation, yet interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space. In this work, we introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra. DiffNMR refines molecular graphs iteratively through a diffusion-based generative process, ensuring global consistency and mitigating error accumulation inherent in autoregressive methods. The framework integrates a two-stage pretraining strategy that aligns spectral and molecular representations via diffusion autoencoder (Diff-AE) and contrastive learning, the incorporation of retrieval initialization and similarity filtering during inference, and a specialized NMR encoder with radial basis function (RBF) encoding for chemical shifts, preserving continuity and chemical correlation. Experimental results demonstrate that DiffNMR achieves competitive performance for NMR-based structure elucidation, offering an efficient and robust solution for automated molecular analysis.