🤖 AI Summary
Reaction product prediction faces a dual challenge: template-based methods suffer from poor generalizability, while template-free approaches exhibit limited accuracy. This paper introduces the Benchmark Reaction Set (BRS)—a compact, high-coverage set of 20 universal reaction templates—and ProPreT5, a customized T5 model that, for the first time, integrates template-guided decoding with end-to-end sequence-to-sequence learning while preserving chemical validity. Built upon SMILES representations, ProPreT5 leverages large-scale pretraining followed by template-constrained fine-tuning. Experiments across multiple benchmarks demonstrate that ProPreT5 significantly outperforms state-of-the-art methods in top-k accuracy, chemical validity, and reaction realism. Crucially, it overcomes the long-standing trade-off between template rigidity and model generalizability, establishing a new paradigm for data-efficient, chemically grounded reaction prediction.
📝 Abstract
The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these limitations, this work proposes the Broad Reaction Set (BRS), a dataset featuring 20 generic reaction templates that allow for the efficient exploration of the chemical space. Additionally, ProPreT5 is introduced, a T5 model tailored to chemistry that achieves a balance between rigid templates and template-free methods. ProPreT5 demonstrates its capability to generate accurate, valid, and realistic reaction products, making it a promising solution that goes beyond the current state-of-the-art on the complex reaction product prediction task.