🤖 AI Summary
Existing graph generation methods—particularly message-passing neural networks—struggle to capture long-range dependencies and domain-specific hard constraints (e.g., chemical valency rules, upper bounds on RNA loop lengths), limiting their practicality in chemistry and biology. This paper introduces the first grammar-guided graph generation framework, integrating a learnable, domain-adaptive graph coarsening mechanism that explicitly encodes both local and global feasibility constraints. By circumventing the information dilution inherent in iterative message passing, our approach guarantees structural validity for arbitrary-length dependencies. We adopt a neuro-symbolic generative paradigm, combining neural flexibility with symbolic constraint satisfaction. On the MOSES benchmark, our method achieves state-of-the-art drug-likeness and synthetic accessibility for small-molecule generation. Moreover, it is the first to successfully generate valid RNA secondary structures with up to 100 nucleotides, all of which are correctly classified by the Infernal covariance model family—achieving 100% validation accuracy.
📝 Abstract
Generative methods for graphs need to be sufficiently flexible to model complex dependencies between sets of nodes. At the same time, the generated graphs need to satisfy domain-dependent feasibility conditions, that is, they should not violate certain constraints that would make their interpretation impossible within the given application domain (e.g. a molecular graph where an atom has a very large number of chemical bounds). Crucially, constraints can involve not only local but also long-range dependencies: for example, the maximal length of a cycle can be bounded. Currently, a large class of generative approaches for graphs, such as methods based on artificial neural networks, is based on message passing schemes. These approaches suffer from information 'dilution' issues that severely limit the maximal range of the dependencies that can be modeled. To address this problem, we propose a generative approach based on the notion of graph grammars. The key novel idea is to introduce a domain-dependent coarsening procedure to provide short-cuts for long-range dependencies. We show the effectiveness of our proposal in two domains: 1) small drugs and 2) RNA secondary structures. In the first case, we compare the quality of the generated molecular graphs via the Molecular Sets (MOSES) benchmark suite, which evaluates the distance between generated and real molecules, their lipophilicity, synthesizability, and drug-likeness. In the second case, we show that the approach can generate very large graphs (with hundreds of nodes) that are accepted as valid examples for a desired RNA family by the"Infernal"covariance model, a state-of-the-art RNA classifier. Our implementation is available on github: github.com/fabriziocosta/GraphLearn