High-Dimensional Analysis of Bootstrap Ensemble Classifiers

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This paper addresses the insufficient generalization performance of Least Squares Support Vector Machines (LSSVM) in high-dimensional, large-sample regimes. We propose a theoretical analysis and optimization framework based on Bootstrap ensemble learning. For the first time, we systematically incorporate Random Matrix Theory into the high-dimensional asymptotic analysis of Bootstrap-LSSVM, establishing a generalization error characterization model under joint growth of sample size $n$ and dimension $p$, where $p/n o gamma$. This analysis reveals convergence properties and phase-transition phenomena. Leveraging these insights, we derive closed-form, adaptive selection rules for both the number of bootstrap subsets and the regularization parameter. Combining theoretical derivation with extensive numerical experiments—across multiple synthetic and real-world datasets—we demonstrate that our strategy improves classification accuracy by 3.2–7.8%, significantly outperforming empirical hyperparameter tuning methods.

Technology Category

Application Category

📝 Abstract

Bootstrap methods have long been a cornerstone of ensemble learning in machine learning. This paper presents a theoretical analysis of bootstrap techniques applied to the Least Square Support Vector Machine (LSSVM) ensemble in the context of large and growing sample sizes and feature dimensionalities. Leveraging tools from Random Matrix Theory, we investigate the performance of this classifier that aggregates decision functions from multiple weak classifiers, each trained on different subsets of the data. We provide insights into the use of bootstrap methods in high-dimensional settings, enhancing our understanding of their impact. Based on these findings, we propose strategies to select the number of subsets and the regularization parameter that maximize the performance of the LSSVM. Empirical experiments on synthetic and real-world datasets validate our theoretical results.

Problem

Research questions and friction points this paper is trying to address.

Analyzes bootstrap techniques for LSSVM ensembles in high dimensions

Investigates performance of aggregated weak classifiers using Random Matrix Theory

Proposes strategies to optimize subset count and regularization parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bootstrap ensemble with LSSVM for high dimensions

Random Matrix Theory analyzes classifier performance

Optimizes subset count and regularization parameters

🔎 Similar Papers

Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data

2024-01-28arXiv.orgCitations: 0

Amazon

218,800.00 - 295,900.00 USD annually

Seattle, WA, USA

Machine Learning Engineer