Density-Aware Farthest Point Sampling

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address regression model training under scarce labeled data, this paper proposes a passive, model-agnostic density-aware farthest-point sampling (DA-FPS) method. DA-FPS optimizes the feature-space distribution of the training set to minimize an upper bound on prediction error: leveraging Lipschitz continuity theory, it constructs an estimable surrogate based on weighted fill distance and proves that DA-FPS asymptotically approaches its optimal solution. Unlike conventional sampling strategies, DA-FPS jointly ensures spatial coverage and local density adaptation. Experiments across two regression models and three benchmark datasets demonstrate that DA-FPS significantly reduces mean absolute error, consistently outperforming baseline methods—including random sampling, standard FPS, and core-set selection—with robust and stable gains. The method thus provides an efficient and reliable pre-processing strategy for small-sample regression, serving as a principled foundation for subsequent active learning pipelines.

Technology Category

Application Category

📝 Abstract

We focus on training machine learning regression models in scenarios where the availability of labeled training data is limited due to computational constraints or high labeling costs. Thus, selecting suitable training sets from unlabeled data is essential for balancing performance and efficiency. For the selection of the training data, we focus on passive and model-agnostic sampling methods that only consider the data feature representations. We derive an upper bound for the expected prediction error of Lipschitz continuous regression models that linearly depends on the weighted fill distance of the training set, a quantity we can estimate simply by considering the data features. We introduce "Density-Aware Farthest Point Sampling" (DA-FPS), a novel sampling method. We prove that DA-FPS provides approximate minimizers for a data-driven estimation of the weighted fill distance, thereby aiming at minimizing our derived bound. We conduct experiments using two regression models across three datasets. The results demonstrate that DA-FPS significantly reduces the mean absolute prediction error compared to other sampling strategies.

Problem

Research questions and friction points this paper is trying to address.

Selecting training sets from unlabeled data under computational constraints

Minimizing prediction error for regression models with limited labeled data

Developing passive sampling methods using only data feature representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Density-Aware Farthest Point Sampling method

Minimizes weighted fill distance bound

Reduces prediction error significantly

🔎 Similar Papers

How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey