Random Forest Autoencoders for Guided Representation Learning

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing supervised visualization methods primarily optimize for classification performance, while state-of-the-art approaches like RF-PHATE lack explicit mapping functions, hindering generalization to unseen samples—particularly under large-scale and label-scarce conditions. To address this, we propose the first end-to-end differentiable framework that deeply integrates random forests with variational autoencoders, enabling learnable out-of-sample kernel extrapolation compatible with arbitrary kernel-based visualization methods. Our approach unifies information geometry and diffusion-based manifold learning (PHATE), yielding robust, general-purpose, and hyperparameter-insensitive low-dimensional representations. Evaluated on multiple benchmark datasets, our method significantly outperforms the standard kernel extension of RF-PHATE, achieving superior visualization accuracy, interpretability, and generalization capability—especially for out-of-sample points.

Technology Category

Application Category

📝 Abstract
Decades of research have produced robust methods for unsupervised data visualization, yet supervised visualization$unicode{x2013}$where expert labels guide representations$unicode{x2013}$remains underexplored, as most supervised approaches prioritize classification over visualization. Recently, RF-PHATE, a diffusion-based manifold learning method leveraging random forests and information geometry, marked significant progress in supervised visualization. However, its lack of an explicit mapping function limits scalability and prevents application to unseen data, posing challenges for large datasets and label-scarce scenarios. To overcome these limitations, we introduce Random Forest Autoencoders (RF-AE), a neural network-based framework for out-of-sample kernel extension that combines the flexibility of autoencoders with the supervised learning strengths of random forests and the geometry captured by RF-PHATE. RF-AE enables efficient out-of-sample supervised visualization and outperforms existing methods, including RF-PHATE's standard kernel extension, in both accuracy and interpretability. Additionally, RF-AE is robust to the choice of hyper-parameters and generalizes to any kernel-based dimensionality reduction method.
Problem

Research questions and friction points this paper is trying to address.

Enhances supervised data visualization techniques
Overcomes scalability in large datasets
Improves accuracy and interpretability of visualization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Random Forest Autoencoders framework
Combines autoencoders with random forests
Enables efficient supervised visualization
🔎 Similar Papers
No similar papers found.
A
Adrien Aumon
Department of Mathematics and Statistics, Université de Montréal, Montreal, Canada; Mila - Quebec AI Institute
S
Shuang Ni
Department of Computer Science and Operations Research, Université de Montréal, Montreal, Canada; Mila - Quebec AI Institute
M
Myriam Lizotte
Department of Mathematics and Statistics, Université de Montréal, Montreal, Canada; Mila - Quebec AI Institute
Guy Wolf
Guy Wolf
Université de Montréal; Mila
Exploratory Data AnalysisDimensionality ReductionManifold LearningGeometric Deep LearningGraph Signal Processing
Kevin R. Moon
Kevin R. Moon
Utah State University
Machine LearningInformation TheoryComputational BiologySignal Processing
Jake S. Rhodes
Jake S. Rhodes
Brigham Young University
Machine LearningData Science