🤖 AI Summary
Graph Bayesian optimization (BO) for graph-structured neural architecture search (NAS) suffers from intractable acquisition function optimization due to the discrete, non-differentiable nature of graph spaces.
Method: We introduce the first explicit, differentiable, and complete mathematical model for graph input spaces—unifying key structural properties (e.g., reachability, shortest path) via a principled integration of graph kernels and acquisition functions. We propose the first optimization-friendly graph encoding provably equivalent to the original graph space, and formulate graph acquisition optimization as a structured discrete optimization problem under topological constraints. Our framework synergistically combines graph encoding theory, structured graph kernel design, and a Bayesian–discrete co-optimization paradigm.
Results: The method significantly improves search efficiency and best-architecture discovery rates across multiple NAS benchmarks, demonstrating exceptional robustness and generalization—especially under data-scarce conditions.
📝 Abstract
Graph Bayesian optimization (BO) has shown potential as a powerful and data-efficient tool for neural architecture search (NAS). Most existing graph BO works focus on developing graph surrogates models, i.e., metrics of networks and/or different kernels to quantify the similarity between networks. However, the acquisition optimization, as a discrete optimization task over graph structures, is not well studied due to the complexity of formulating the graph search space and acquisition functions. This paper presents explicit optimization formulations for graph input space including properties such as reachability and shortest paths, which are used later to formulate graph kernels and the acquisition function. We theoretically prove that the proposed encoding is an equivalent representation of the graph space and provide restrictions for the NAS domain with either node or edge labels. Numerical results over several NAS benchmarks show that our method efficiently finds the optimal architecture for most cases, highlighting its efficacy.