🤖 AI Summary
To address the decoupling of prediction and generation capabilities in generative models for scientific discovery, this paper proposes a VAE-DKL synergistic framework: deep kernel learning (DKL) is embedded into the latent space of a variational autoencoder (VAE), enabling structured modeling and property-guided optimization of latent variables via Gaussian process (GP) regression. This approach achieves, for the first time, end-to-end joint training of generation and property prediction. On the QM9 dataset, it attains a prediction error of 0.12 eV for enthalpy—surpassing a standalone VAE+GP baseline—and enables controllable generation of novel molecular structures outside the training set that satisfy target property constraints. The core innovation lies in the co-optimization mechanism within the latent space: it preserves the VAE’s high-fidelity sampling capability while endowing it with interpretable and intervenable property prediction functionality, thereby significantly enhancing efficiency in materials inverse design.
📝 Abstract
We introduce a Deep Kernel Learning Variational Autoencoder (VAE-DKL) framework that integrates the generative power of a Variational Autoencoder (VAE) with the predictive nature of Deep Kernel Learning (DKL). The VAE learns a latent representation of high-dimensional data, enabling the generation of novel structures, while DKL refines this latent space by structuring it in alignment with target properties through Gaussian Process (GP) regression. This approach preserves the generative capabilities of the VAE while enhancing its latent space for GP-based property prediction. We evaluate the framework on two datasets: a structured card dataset with predefined variational factors and the QM9 molecular dataset, where enthalpy serves as the target function for optimization. The model demonstrates high-precision property prediction and enables the generation of novel out-of-training subset structures with desired characteristics. The VAE-DKL framework offers a promising approach for high-throughput material discovery and molecular design, balancing structured latent space organization with generative flexibility.