🤖 AI Summary
This work addresses the challenges of inaccurate intrinsic dimensionality (ID) estimation in high-dimensional data and the difficulty of simultaneously achieving effective dimensionality reduction and high-fidelity reconstruction. We propose an autoencoder framework incorporating a novel double reweighted CancelOut layer. Methodologically, we introduce a projection-reconstruction loss that explicitly models how latent-space dimensionality reduction affects reconstruction error, enabling joint optimization of ID estimation and data reconstruction; the CancelOut layer dynamically learns dimensional importance, enhancing model robustness and interpretability. Evaluated on theoretical benchmarks and real-world numerical fluid dynamics data—including one-dimensional free-surface flows—the method achieves significantly improved ID estimation accuracy and generalization capability, while enabling high-fidelity reconstruction even in extremely low-dimensional latent spaces. This establishes a new paradigm for dimensionality reduction modeling of complex physical systems.
📝 Abstract
This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the projected reconstruction loss term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension. We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset's intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.