🤖 AI Summary
Solvent selection—a critical yet challenging task in chemistry—is hindered by theoretical modeling difficulties and severe data scarcity, especially for continuous-flow processes. Method: We introduce the first temporal solvent selection benchmark dataset tailored for flow chemistry, encompassing 1,200+ continuous process conditions with high-resolution transient flow-control parameters and corresponding yield labels. To address sparse, sequential process spaces, we propose a temporal regression framework integrating domain-informed feature engineering, transfer learning, and active learning. Contribution/Results: Our method significantly improves prediction accuracy for solvent substitution under low-data regimes, reducing mean absolute error by 32% on average. The dataset fills a key gap in AI for Chemistry—namely, benchmarks for time-series-driven, few-shot solvent replacement—and empirically validates multiple AI strategies for sustainable chemical manufacturing. This work advances reproducible, scalable AI benchmarking in chemistry.
📝 Abstract
Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications. We showcase benchmarking for regression algorithms, transfer-learning approaches, feature engineering, and active learning, with important applications towards solvent replacement and sustainable manufacturing.