BBOmix: A Tabular Benchmark for Hyperparameter Optimization of Unsupervised Biological Representation Learning

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This study addresses the challenge that deep autoencoders in unsupervised biological representation learning are highly sensitive to hyperparameters, and their reconstruction loss exhibits only weak correlation with downstream task performance, complicating effective hyperparameter optimization. To tackle this issue, the work introduces the first open-source benchmark tailored to real-world multi-omics data, encompassing four autoencoder architectures across seven omics modalities. It systematically evaluates single- and multi-fidelity hyperparameter optimization methods alongside transfer learning strategies. Based on 105,000 model evaluations, the study quantifies the weak relationship between reconstruction loss and downstream performance and establishes a rigorous, reproducible, and fair baseline for unsupervised representation learning in biological contexts.
📝 Abstract
The rapid advancement of high-throughput sequencing has led to large, high-dimensional omics datasets. Deep unsupervised learning architectures, particularly Autoencoders (AEs), are increasingly used for dimensionality reduction and representation learning in this domain. However, AEs are highly sensitive to architectural choices and hyperparameters, and unsupervised optimization typically relies on reconstruction loss, which may be a poor proxy for downstream utility. Exhaustive hyperparameter optimization (HPO) is computationally expensive, leading researchers to frequently rely on suboptimal default configurations. To democratize access to large-scale unsupervised HPO research, we introduce $\textbf{BBOmix}$, the first open-source tabular benchmark for unsupervised representation learning on real-world biological data. Our benchmark includes 105,000 evaluations across four AE architectures and seven multi-omics modalities from the TCGA and SCHC datasets. We quantify the correlation between reconstruction loss and downstream task performance and provide an extensive evaluation of state-of-the-art single-fidelity, multi-fidelity, and transfer learning HPO methods, establishing a rigorous baseline for future research in unsupervised biological representation learning.
Problem

Research questions and friction points this paper is trying to address.

hyperparameter optimization
unsupervised representation learning
autoencoders
biological data
reconstruction loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

hyperparameter optimization
unsupervised representation learning
autoencoders
multi-omics
benchmark
🔎 Similar Papers
No similar papers found.
L
Luca Thale-Bombien
Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University
J
Jan Ewald
Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University
R
Ralf König
Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University
Aaron Klein
Aaron Klein
ELLIS Institute Tübingen, ScaDS.AI
AutoMLNeural Architecture SearchBayesian OptimizationDeep Learning