High-Dimensional Analysis of Bootstrap Ensemble Classifiers

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the insufficient generalization performance of Least Squares Support Vector Machines (LSSVM) in high-dimensional, large-sample regimes. We propose a theoretical analysis and optimization framework based on Bootstrap ensemble learning. For the first time, we systematically incorporate Random Matrix Theory into the high-dimensional asymptotic analysis of Bootstrap-LSSVM, establishing a generalization error characterization model under joint growth of sample size $n$ and dimension $p$, where $p/n o gamma$. This analysis reveals convergence properties and phase-transition phenomena. Leveraging these insights, we derive closed-form, adaptive selection rules for both the number of bootstrap subsets and the regularization parameter. Combining theoretical derivation with extensive numerical experiments—across multiple synthetic and real-world datasets—we demonstrate that our strategy improves classification accuracy by 3.2–7.8%, significantly outperforming empirical hyperparameter tuning methods.

Technology Category

Application Category

📝 Abstract
Bootstrap methods have long been a cornerstone of ensemble learning in machine learning. This paper presents a theoretical analysis of bootstrap techniques applied to the Least Square Support Vector Machine (LSSVM) ensemble in the context of large and growing sample sizes and feature dimensionalities. Leveraging tools from Random Matrix Theory, we investigate the performance of this classifier that aggregates decision functions from multiple weak classifiers, each trained on different subsets of the data. We provide insights into the use of bootstrap methods in high-dimensional settings, enhancing our understanding of their impact. Based on these findings, we propose strategies to select the number of subsets and the regularization parameter that maximize the performance of the LSSVM. Empirical experiments on synthetic and real-world datasets validate our theoretical results.
Problem

Research questions and friction points this paper is trying to address.

Analyzes bootstrap techniques for LSSVM ensembles in high dimensions
Investigates performance of aggregated weak classifiers using Random Matrix Theory
Proposes strategies to optimize subset count and regularization parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bootstrap ensemble with LSSVM for high dimensions
Random Matrix Theory analyzes classifier performance
Optimizes subset count and regularization parameters
Hamza Cherkaoui
Hamza Cherkaoui
Télécom SudParis
banditsDiffusion modelsMarkov chainsLLMsCurriculum learning
M
Malik Tiomoko
Huawei Noah’s Ark Lab, Huawei Technologies, France.
M
M. Seddik
Technology Innovation Institute, Abu Dhabi, United Arab Emirates.
Cosme Louart
Cosme Louart
Assistant Professor, Chinese University of Hong Kong, Shenzhen
Random matricesConcentration of the measureMachine learning
E
Ekkehard Schnoor
Fraunhofer Heinrich Hertz Institute, Department of Artificial Intelligence, 10587 Berlin, Germany.
B
Balázs Kégl
Huawei Noah’s Ark Lab, Huawei Technologies, France.