🤖 AI Summary
Existing GNN-based molecular property prediction methods (e.g., ESAN) treat molecules as bags of independent, non-overlapping substructures, neglecting their spatial connectivity and structural overlaps—leading to poor discrimination of configurational isomers and degraded performance on large molecules.
Method: We propose Graph-of-Molecular-Substructures (GoMS), the first framework that explicitly models a molecule as an *interaction graph*, where nodes represent overlapping subgraphs and edges encode structural relationships. GoMS introduces equivariant constraints and topology-aware edge construction to enable differentiable subgraph partitioning and hierarchical aggregation, breaking the conventional bag-of-substructures assumption.
Contribution/Results: GoMS effectively distinguishes molecules with identical composition but distinct 3D arrangements. It outperforms ESAN and other baselines across multiple benchmarks, achieving up to 12.7% improvement in prediction accuracy for large molecules (>100 atoms). Theoretically, GoMS exhibits strictly stronger expressive power than bag-based models, making it particularly suitable for predicting properties of complex functional materials.
📝 Abstract
While graph neural networks have shown remarkable success in molecular property prediction, current approaches like the Equivariant Subgraph Aggregation Networks (ESAN) treat molecules as bags of independent substructures, overlooking crucial relationships between these components. We present Graph of Molecule Substructures (GoMS), a novel architecture that explicitly models the interactions and spatial arrangements between molecular substructures. Unlike ESAN's bag-based representation, GoMS constructs a graph where nodes represent subgraphs and edges capture their structural relationships, preserving critical topological information about how substructures are connected and overlap within the molecule. Through extensive experiments on public molecular datasets, we demonstrate that GoMS outperforms ESAN and other baseline methods, with particularly improvements for large molecules containing more than 100 atoms. The performance gap widens as molecular size increases, demonstrating GoMS's effectiveness for modeling industrial-scale molecules. Our theoretical analysis demonstrates that GoMS can distinguish molecules with identical subgraph compositions but different spatial arrangements. Our approach shows particular promise for materials science applications involving complex molecules where properties emerge from the interplay between multiple functional units. By capturing substructure relationships that are lost in bag-based approaches, GoMS represents a significant advance toward scalable and interpretable molecular property prediction for real-world applications.