A new measure of dependence: Integrated $R^2$

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of quantifying the dependence of a random variable (Y) on a random vector (X). We propose Integrated (R^2)—the first nonparametric dependence measure that simultaneously satisfies axiomatized completeness (independence (Leftrightarrow) value 0; deterministic functional dependence (Leftrightarrow) value 1), computational tractability, and interpretability. Building upon this measure, we develop FORD: a parameter-free, model-agnostic feature selection algorithm based on a greedy forward-selection framework. FORD employs integral estimators of conditional expectation-based (R^2)—implemented via kernel or nearest-neighbor methods—and is proven asymptotically consistent. Experiments demonstrate that FORD significantly outperforms baselines such as FOCI on both synthetic and real-world datasets under multivariate nonlinear and high-dimensional sparse settings, achieving superior feature identification accuracy and robustness in ranking stability.

Technology Category

Application Category

📝 Abstract
We propose a new measure of dependence that quantifies the degree to which a random variable $Y$ depends on a random vector $X$. This measure is zero if and only if $Y$ and $X$ are independent, and equals one if and only if $Y$ is a measurable function of $X$. We introduce a simple and interpretable estimator that is comparable in ease of computation to classical correlation coefficients such as Pearson's, Spearman's, or Chatterjee's. Building on this coefficient, we develop a model-free variable selection algorithm, feature ordering by dependence (FORD), inspired by FOCI. FORD requires no tuning parameters and is provably consistent under suitable sparsity assumptions. We demonstrate its effectiveness and improvements over FOCI through experiments on both synthetic and real datasets.
Problem

Research questions and friction points this paper is trying to address.

Proposes a new dependence measure for random variables
Introduces a simple estimator comparable to correlation coefficients
Develops a model-free variable selection algorithm (FORD)
Innovation

Methods, ideas, or system contributions that make the work stand out.

New dependence measure: Integrated R-squared
Simple estimator like classical coefficients
Model-free algorithm FORD for selection
🔎 Similar Papers
No similar papers found.