🤖 AI Summary
Existing dataset characterization methods—statistical, structural, and model-driven—lack sufficient interpretability and deep structural insight. To address this, we propose a novel tensor-based representation paradigm that transcends conventional two-dimensional assumptions. Our approach leverages high-order tensor decomposition, multilinear modeling, and cross-modal joint representation to explicitly capture high-dimensional, nonlinear, and multi-source relational structures inherent in complex data. Extensive experiments demonstrate that the proposed method significantly outperforms baseline approaches in three key aspects: (i) disentangling intricate data structures, (ii) enhancing feature interpretability, and (iii) enabling traceable downstream task reasoning. This work establishes a unified tensor modeling framework for dataset representation and pioneers a data-driven discovery pathway tailored for explainable AI. It contributes both theoretical advances—through formalizing multilinear structure learning—and practical utility—by providing an interpretable, computationally grounded toolkit for transparent data analysis.
📝 Abstract
In the evolving domains of Machine Learning and Data Analytics, existing dataset characterization methods such as statistical, structural, and model-based analyses often fail to deliver the deep understanding and insights essential for innovation and explainability. This work surveys the current state-of-the-art conventional data analytic techniques and examines their limitations, and discusses a variety of tensor-based methods and how these may provide a more robust alternative to traditional statistical, structural, and model-based dataset characterization techniques. Through examples, we illustrate how tensor methods unveil nuanced data characteristics, offering enhanced interpretability and actionable intelligence. We advocate for the adoption of tensor-based characterization, promising a leap forward in understanding complex datasets and paving the way for intelligent, explainable data-driven discoveries.