🤖 AI Summary
This work addresses the information synchronization bottleneck in de novo molecular generation caused by the cyclic dependency between atom and bond inference. To resolve this, the authors propose DualLGD, a dual-stream graph diffusion architecture that explicitly decouples the representation spaces of atoms and bonds. The bond space is constructed via a line graph, and a bidirectional cross-attention mechanism with relational constraints is introduced to enable precise synchronization between atoms and bonds during alternating diffusion steps. Evaluated on the NPLIB1 and MassSpecGym benchmarks, DualLGD achieves Top-1 accuracy of 34.37% and 23.89%, respectively—approximately three times higher than previous state-of-the-art methods—and surpasses existing best pre-trained models without requiring any pre-training.
📝 Abstract
De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical environment, yet an atom's environment is in turn defined by its incident bonds. Existing graph diffusion methods process atoms and bonds within a single computation stream, where atom-bond information synchronization can only occur implicitly across layers. We argue that this single-stream paradigm, rather than the choice of any particular aggregation kernel, is a key architectural bottleneck. We propose DualLGD (Dual-stream Line Graph Diffusion), which reformulates molecular graph denoising as the alternating solution of two coupled subproblems: atom-level reasoning and bond-level reasoning, each operating in its own dedicated representation space. The line graph provides a natural mathematical construction for the bond space, in which bond angles, dihedrals, conjugation chains, and rings correspond to local topological motifs between bonds. Incidence-constrained bidirectional cross-attention synchronizes the two streams at every layer, ensuring that each atom attends only to its incident bonds and vice versa, respecting the fundamental chemical principle that an atom's environment is determined by its bonding context. On the NPLIB1 and MassSpecGym benchmarks, DualLGD achieves top-1 accuracy of 34.37\% and 23.89\%, approximately $3\times$ the previous state of the art. Ablation studies confirm the architecture as the primary source of improvement: DualLGD without any pre-training already surpasses the previous best fully pretrained model.