🤖 AI Summary
Traditional functional data analysis relies on the Lebesgue measure over a fixed domain, limiting its adaptability to unbounded domains or non-uniformly distributed data—thereby constraining model expressivity and predictive accuracy. To address this, we propose a data-driven framework for adaptive measure selection, constructing linear functional models within a Hilbert space endowed with an arbitrarily defined measure. By optimizing the inner-product structure, our approach enhances model flexibility. The method integrates measure-adaptive functional principal component analysis with generalized functional regression, enabling principled modeling on unbounded domains and non-standard data distributions. Extensive experiments on synthetic data, as well as real-world COVID-19 epidemiological and NHANES health survey datasets, demonstrate that our measure-adaptive models significantly outperform conventional Lebesgue-based approaches in prediction accuracy. These results empirically validate that judicious choice of measure—not merely the functional form—is critical for improving statistical performance in functional data analysis.
📝 Abstract
Advancements in modern science have led to an increased prevalence of functional data, which are usually viewed as elements of the space of square-integrable functions $L^2$. Core methods in functional data analysis, such as functional principal component analysis, are typically grounded in the Hilbert structure of $L^2$ and rely on inner products based on integrals with respect to the Lebesgue measure over a fixed domain. A more flexible framework is proposed, where the measure can be arbitrary, allowing natural extensions to unbounded domains and prompting the question of optimal measure choice. Specifically, a novel functional linear model is introduced that incorporates a data-adaptive choice of the measure that defines the space, alongside an enhanced function principal component analysis. Selecting a good measure can improve the model's predictive performance, especially when the underlying processes are not well-represented when adopting the default Lebesgue measure. Simulations, as well as applications to COVID-19 data and the National Health and Nutrition Examination Survey data, show that the proposed approach consistently outperforms the conventional functional linear model.