🤖 AI Summary
This paper addresses the efficient compression and approximation of labeled data in metric spaces, without imposing additional structural assumptions. It introduces the *discrete-continuous modulus* as an intrinsic regularity measure to characterize the inherent smoothness of data approximation—its first application to this problem. Methodologically, it develops a sample-driven approximation framework, formulating data approximation as a stochastic sphere-covering and combinatorial optimization problem; it further constructs multilevel approximation spaces and an enhanced multilevel Monte Carlo algorithm for hierarchical, efficient approximation of statistically uncertain data. Theoretically, it establishes convergence guarantees for intrinsic data approximation. Algorithmically, it ensures computability of the discrete-continuous modulus and controllable computational complexity. Numerical experiments validate the method’s feasibility, stability, and theoretical convergence across diverse synthetic and real-world datasets.
📝 Abstract
Analysis and processing of data is a vital part of our modern society and requires vast amounts of computational resources. To reduce the computational burden, compressing and approximating data has become a central topic. We consider the approximation of labeled data samples, mathematically described as site-to-value maps between finite metric spaces. Within this setting, we identify the discrete modulus of continuity as an effective data-intrinsic quantity to measure regularity of site-to-value maps without imposing further structural assumptions. We investigate the consistency of the discrete modulus of continuity in the infinite data limit and propose an algorithm for its efficient computation. Building on these results, we present a sample based approximation theory for labeled data. For data subject to statistical uncertainty we consider multilevel approximation spaces and a variant of the multilevel Monte Carlo method to compute statistical quantities of interest. Our considerations connect approximation theory for labeled data in metric spaces to the covering problem for (random) balls on the one hand and the efficient evaluation of the discrete modulus of continuity to combinatorial optimization on the other hand. We provide extensive numerical studies to illustrate the feasibility of the approach and to validate our theoretical results.