A graph-structured distance for mixed-variable domains with meta variables

📅 2024-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modeling heterogeneous data with mixed variable types (continuous, integer, categorical) and hierarchical domains—including meta-variables that govern problem structure—remains challenging, especially when inputs are structurally variable and misaligned. Method: We propose the first unified graph-based modeling paradigm supporting such non-aligned, variable-structure mixed inputs. Our approach constructs a meta-variable-driven graph representation, introduces a novel graph structural distance metric, and integrates mixed-variable embedding with hierarchical conditional space modeling. Contribution/Results: (1) A flexible modeling framework wherein meta-variables dynamically determine graph structure; (2) the first graph structural distance function explicitly designed for this setting. Evaluated on hyperparameter modeling regression tasks, our method significantly improves MLP-based performance prediction accuracy using only small-scale data, demonstrating strong generalization and practical utility in data-scarce heterogeneous scenarios.

Technology Category

Application Category

📝 Abstract
Heterogeneous datasets emerge in various machine learning and optimization applications that feature different input sources, types or formats. Most models or methods do not natively tackle heterogeneity. Hence, such datasets are often partitioned into smaller and simpler ones, which may limit the generalizability or performance, especially if data is limited. The first main contribution of this work is a modeling framework that generalizes hierarchical, tree-structured, variable-size or conditional search frameworks. The framework models mixed-variable domains in which variables may be continuous, integer, or categorical, with some identified as meta when they influence the structure of the problem. The second main contribution is a novel distance that compares any pair of mixed-variable points that do not share the same variables, allowing to use whole heterogeneous datasets that reside in mixed-variable domains with meta variables. The contributions are illustrated on several regression experiments, in which the performance of a multilayer perceptron with respect to its hyperparameters is modeled.
Problem

Research questions and friction points this paper is trying to address.

Modeling mixed-variable hierarchical domains with meta variables
Comparing points in heterogeneous datasets with different variables
Enhancing performance in limited-data scenarios via unified frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizes hierarchical and tree-structured modeling frameworks
Introduces distance for mixed-variable hierarchical domains
Handles continuous, integer, categorical, and meta variables
🔎 Similar Papers
No similar papers found.
E
Edward Hallé-Hannan
GERAD and Department of Mathematics and Industrial Engineering, Polytechnique Montréal
Charles Audet
Charles Audet
GERAD and Department of Mathematics and Industrial Engineering, Polytechnique Montréal
Y
Y. Diouane
GERAD and Department of Mathematics and Industrial Engineering, Polytechnique Montréal
S
Sébastien Le Digabel
GERAD and Department of Mathematics and Industrial Engineering, Polytechnique Montréal
P
P. Saves
DTIS, ONERA and Fédération ENAC ISAE-SUPAERO ONERA, Université de Toulouse, France