Scalable extensions to given-data Sobol' index estimators

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing Sobol index estimation methods struggle with variance-based sensitivity analysis for high-dimensional models (e.g., >10⁴ dimensions, such as large neural networks) and fail under non-standard input distributions containing abundant repeated values. This paper proposes a scalable, data-driven Sobol index estimation framework. First, it derives a general estimator applicable to arbitrary variable partitions. Second, it designs a streaming batch-processing algorithm enabling low-memory, incremental computation. Third, it introduces a heuristic small-index noise filtering mechanism to enhance robustness in identifying sparse sensitivities. The method operates solely on input–output samples, supports adaptive data partitioning, and incorporates statistical noise suppression. Experiments on large neural networks demonstrate significantly reduced memory overhead while maintaining accuracy and computational efficiency comparable to state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Given-data methods for variance-based sensitivity analysis have significantly advanced the feasibility of Sobol' index computation for computationally expensive models and models with many inputs. However, the limitations of existing methods still preclude their application to models with an extremely large number of inputs. In this work, we present practical extensions to the existing given-data Sobol' index method, which allow variance-based sensitivity analysis to be efficiently performed on large models such as neural networks, which have $>10^4$ parameterizable inputs. For models of this size, holding all input-output evaluations simultaneously in memory -- as required by existing methods -- can quickly become impractical. These extensions also support nonstandard input distributions with many repeated values, which are not amenable to equiprobable partitions employed by existing given-data methods. Our extensions include a general definition of the given-data Sobol' index estimator with arbitrary partition, a streaming algorithm to process input-output samples in batches, and a heuristic to filter out small indices that are indistinguishable from zero indices due to statistical noise. We show that the equiprobable partition employed in existing given-data methods can introduce significant bias into Sobol' index estimates even at large sample sizes and provide numerical analyses that demonstrate why this can occur. We also show that our streaming algorithm can achieve comparable accuracy and runtimes with lower memory requirements, relative to current methods which process all samples at once. We demonstrate our novel developments on two application problems in neural network modeling.

Problem

Research questions and friction points this paper is trying to address.

Extending Sobol index estimators for large input models

Enabling sensitivity analysis on neural networks with many parameters

Handling nonstandard input distributions with repeated values

Innovation

Methods, ideas, or system contributions that make the work stand out.

Streaming algorithm for batched sample processing

General definition with arbitrary partition support

Heuristic filter to eliminate negligible indices

🔎 Similar Papers

No similar papers found.

Authors to Follow