Density-Aware Farthest Point Sampling

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address regression model training under scarce labeled data, this paper proposes a passive, model-agnostic density-aware farthest-point sampling (DA-FPS) method. DA-FPS optimizes the feature-space distribution of the training set to minimize an upper bound on prediction error: leveraging Lipschitz continuity theory, it constructs an estimable surrogate based on weighted fill distance and proves that DA-FPS asymptotically approaches its optimal solution. Unlike conventional sampling strategies, DA-FPS jointly ensures spatial coverage and local density adaptation. Experiments across two regression models and three benchmark datasets demonstrate that DA-FPS significantly reduces mean absolute error, consistently outperforming baseline methods—including random sampling, standard FPS, and core-set selection—with robust and stable gains. The method thus provides an efficient and reliable pre-processing strategy for small-sample regression, serving as a principled foundation for subsequent active learning pipelines.

Technology Category

Application Category

📝 Abstract
We focus on training machine learning regression models in scenarios where the availability of labeled training data is limited due to computational constraints or high labeling costs. Thus, selecting suitable training sets from unlabeled data is essential for balancing performance and efficiency. For the selection of the training data, we focus on passive and model-agnostic sampling methods that only consider the data feature representations. We derive an upper bound for the expected prediction error of Lipschitz continuous regression models that linearly depends on the weighted fill distance of the training set, a quantity we can estimate simply by considering the data features. We introduce "Density-Aware Farthest Point Sampling" (DA-FPS), a novel sampling method. We prove that DA-FPS provides approximate minimizers for a data-driven estimation of the weighted fill distance, thereby aiming at minimizing our derived bound. We conduct experiments using two regression models across three datasets. The results demonstrate that DA-FPS significantly reduces the mean absolute prediction error compared to other sampling strategies.
Problem

Research questions and friction points this paper is trying to address.

Selecting training sets from unlabeled data under computational constraints
Minimizing prediction error for regression models with limited labeled data
Developing passive sampling methods using only data feature representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Density-Aware Farthest Point Sampling method
Minimizes weighted fill distance bound
Reduces prediction error significantly
🔎 Similar Papers
No similar papers found.
P
Paolo Climaco
Institute for Numerical Simulation, University of Bonn; Department of Mathematics, University of California, Los Angeles
Jochen Garcke
Jochen Garcke
Universität Bonn, Fraunhofer SCAI
scientific computingmachine learningnumerical simulationhigh-dimensional approximation