🤖 AI Summary
To address reliability assurance of pretrained foundation models in engineering deployments—particularly in communication systems—this paper formulates hyperparameter selection as a multiple hypothesis testing problem and, for the first time, derives verifiable statistical upper bounds on population risk within the Learn-Then-Test (LTT) framework. Methodologically, we introduce four key innovations: (i) multi-objective hyperparameter optimization, (ii) Bayesian integration of domain-specific prior knowledge, (iii) graph-based modeling of task-dependent structural dependencies, and (iv) an adaptive selection mechanism tailored to heterogeneous risk metrics. The proposed framework uniformly supports diverse risk measures—including bit error rate and calibration error—and accommodates varying strengths of statistical guarantees, such as false discovery rate (FDR) control and high-confidence upper bounds. Empirical evaluation on communication systems demonstrates substantial improvements in deployment reliability, with analytically verifiable risk bounds.
📝 Abstract
Hyperparameter selection is a critical step in the deployment of artificial intelligence (AI) models, particularly in the current era of foundational, pre-trained, models. By framing hyperparameter selection as a multiple hypothesis testing problem, recent research has shown that it is possible to provide statistical guarantees on population risk measures attained by the selected hyperparameter. This paper reviews the Learn-Then-Test (LTT) framework, which formalizes this approach, and explores several extensions tailored to engineering-relevant scenarios. These extensions encompass different risk measures and statistical guarantees, multi-objective optimization, the incorporation of prior knowledge and dependency structures into the hyperparameter selection process, as well as adaptivity. The paper also includes illustrative applications for communication systems.