Unlocking High-Fidelity Molecular Generation from Mass Spectra via Dual-Stream Line Graph Diffusion

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
This work addresses the information synchronization bottleneck in de novo molecular generation caused by the cyclic dependency between atom and bond inference. To resolve this, the authors propose DualLGD, a dual-stream graph diffusion architecture that explicitly decouples the representation spaces of atoms and bonds. The bond space is constructed via a line graph, and a bidirectional cross-attention mechanism with relational constraints is introduced to enable precise synchronization between atoms and bonds during alternating diffusion steps. Evaluated on the NPLIB1 and MassSpecGym benchmarks, DualLGD achieves Top-1 accuracy of 34.37% and 23.89%, respectively—approximately three times higher than previous state-of-the-art methods—and surpasses existing best pre-trained models without requiring any pre-training.
📝 Abstract
De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical environment, yet an atom's environment is in turn defined by its incident bonds. Existing graph diffusion methods process atoms and bonds within a single computation stream, where atom-bond information synchronization can only occur implicitly across layers. We argue that this single-stream paradigm, rather than the choice of any particular aggregation kernel, is a key architectural bottleneck. We propose DualLGD (Dual-stream Line Graph Diffusion), which reformulates molecular graph denoising as the alternating solution of two coupled subproblems: atom-level reasoning and bond-level reasoning, each operating in its own dedicated representation space. The line graph provides a natural mathematical construction for the bond space, in which bond angles, dihedrals, conjugation chains, and rings correspond to local topological motifs between bonds. Incidence-constrained bidirectional cross-attention synchronizes the two streams at every layer, ensuring that each atom attends only to its incident bonds and vice versa, respecting the fundamental chemical principle that an atom's environment is determined by its bonding context. On the NPLIB1 and MassSpecGym benchmarks, DualLGD achieves top-1 accuracy of 34.37\% and 23.89\%, approximately $3\times$ the previous state of the art. Ablation studies confirm the architecture as the primary source of improvement: DualLGD without any pre-training already surpasses the previous best fully pretrained model.
Problem

Research questions and friction points this paper is trying to address.

molecular generation
mass spectra
atom-bond dependency
graph diffusion
de novo structure elucidation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-stream diffusion
line graph
molecular generation
mass spectra
cross-attention
🔎 Similar Papers
No similar papers found.
X
Xujun Che
Department of Software and Information Systems, University of North Carolina at Charlotte, Charlotte, NC 28223; Center for Environmental Monitoring and Informatics Technologies for Public Health, University of North Carolina at Charlotte, Charlotte, NC 28223
X
Xiuxia Du
Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223; Center for Environmental Monitoring and Informatics Technologies for Public Health, University of North Carolina at Charlotte, Charlotte, NC 28223
Depeng Xu
Depeng Xu
University of North Carolina at Charlotte
Machine LearningData PrivacyFairness