🤖 AI Summary
This work addresses the synthetic accessibility bottleneck in molecular discovery by proposing a syntax–semantics decoupled two-level program synthesis framework. At the syntax level, Markov Chain Monte Carlo (MCMC) searches over molecular skeleton grammars; at the semantics level, a policy network—trained on fixed skeletons—generates executable retrosynthetic reaction pathways. For the first time, molecular synthesis is formulated as a structured program synthesis problem, enabling user-specified resource constraints (e.g., step count, available reagents) and inherently favoring concise, high-feasibility routes. The method achieves state-of-the-art performance on synthesizable drug-like molecule generation and analogy-based optimization of non-synthesizable molecules. It provides explicit, interpretable synthesis pathways, supports automatic pathway simplification, and integrates seamlessly with autonomous synthesis platforms. This framework establishes a novel paradigm for AI-driven retrosynthetic planning, bridging symbolic reasoning with deep learning while ensuring chemical validity and practical deployability.
📝 Abstract
Designing synthetically accessible molecules and recommending analogs to unsynthesizable molecules are important problems for accelerating molecular discovery. We reconceptualize both problems using ideas from program synthesis. Drawing inspiration from syntax-guided synthesis approaches, we decouple the syntactic skeleton from the semantics of a synthetic tree to create a bilevel framework for reasoning about the combinatorial space of synthesis pathways. Given a molecule we aim to generate analogs for, we iteratively refine its skeletal characteristics via Markov Chain Monte Carlo simulations over the space of syntactic skeletons. Given a black-box oracle to optimize, we formulate a joint design space over syntactic templates and molecular descriptors and introduce evolutionary algorithms that optimize both syntactic and semantic dimensions synergistically. Our key insight is that once the syntactic skeleton is set, we can amortize over the search complexity of deriving the program's semantics by training policies to fully utilize the fixed horizon Markov Decision Process imposed by the syntactic template. We demonstrate performance advantages of our bilevel framework for synthesizable analog generation and synthesizable molecule design. Notably, our approach offers the user explicit control over the resources required to perform synthesis and biases the design space towards simpler solutions, making it particularly promising for autonomous synthesis platforms. Code is at https://github.com/shiningsunnyday/SynthesisNet.