SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
How can external AI/ML models be robustly verified and interpretably assessed for privacy-sensitive applications—without accessing original training or test data? This paper proposes SPATA, a statistical pattern projection framework that maps tabular data into domain-agnostic, discretized, deterministic pattern representations. By implicitly modeling decision boundaries via these patterns, SPATA avoids exposing sensitive raw data. It establishes the first deterministic verification framework enabling both quantitative robustness analysis and interpretable report generation without requiring original datasets. Its core innovation lies in substituting individual instances with reproducible statistical patterns—thereby preserving data privacy while enabling reliable feature attribution and auditability of model behavior. Experiments demonstrate that SPATA significantly enhances transparency, verifiability, and trustworthiness of AI systems under strict privacy constraints.

Technology Category

Application Category

📝 Abstract
Due to the susceptibility of Artificial Intelligence (AI) to data perturbations and adversarial examples, it is crucial to perform a thorough robustness evaluation before any Machine Learning (ML) model is deployed. However, examining a model's decision boundaries and identifying potential vulnerabilities typically requires access to the training and testing datasets, which may pose risks to data privacy and confidentiality. To improve transparency in organizations that handle confidential data or manage critical infrastructure, it is essential to allow external verification and validation of AI without the disclosure of private datasets. This paper presents Systematic Pattern Analysis (SPATA), a deterministic method that converts any tabular dataset to a domain-independent representation of its statistical patterns, to provide more detailed and transparent data cards. SPATA computes the projection of each data instance into a discrete space where they can be analyzed and compared, without risking data leakage. These projected datasets can be reliably used for the evaluation of how different features affect ML model robustness and for the generation of interpretable explanations of their behavior, contributing to more trustworthy AI.
Problem

Research questions and friction points this paper is trying to address.

Evaluating AI robustness without exposing private training data
Converting tabular datasets to secure statistical pattern representations
Enabling external validation of ML models while preserving confidentiality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts datasets to domain-independent statistical patterns
Projects data instances into discrete space for analysis
Enables ML robustness evaluation without data leakage
🔎 Similar Papers
No similar papers found.
J
João Vitorino
GECAD, ISEP, Polytechnic of Porto
Eva Maia
Eva Maia
GECAD-ISEP
CyberSecurityArtificial InteligenceMachine LearningIndustry 4.0Encryption
Isabel Praça
Isabel Praça
Professor, ISEP
C
Carlos Soares
Faculty of Engineering, University of Porto