Efficient Adaptive Data Analysis over Dense Distributions

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses a fundamental trade-off in adaptive data analysis between computational efficiency and sample complexity: computationally efficient algorithms are typically suboptimal in sample usage, whereas sample-optimal methods are often computationally infeasible. Focusing on settings where the data distribution is dense relative to a known prior, the paper proposes an adaptive query mechanism that eschews differential privacy yet satisfies predicate single-out (PSO) security. Built upon a distribution-specific learning framework, the method achieves, for the first time in the dense regime, the optimal $O(\log T)$ sample complexity while remaining computationally efficient—significantly improving upon the $O(\sqrt{T})$ sample requirements of conventional efficient approaches. This result also uncovers an intrinsic connection between adaptive data analysis and PSO security.

Technology Category

Application Category

📝 Abstract

Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid statistical inference. Adaptive Data Analysis (ADA) mechanisms address this challenge; however, there is a fundamental tension between computational efficiency and sample complexity. For $T$ rounds of adaptive analysis, computationally efficient algorithms typically incur suboptimal $O(\sqrt{T})$ sample complexity, whereas statistically optimal $O(\log T)$ algorithms are computationally intractable under standard cryptographic assumptions. In this work, we shed light on this trade-off by identifying a natural class of data distributions under which both computational efficiency and optimal sample complexity are achievable. We propose a computationally efficient ADA mechanism that attains optimal $O(\log T)$ sample complexity when the data distribution is dense with respect to a known prior. This setting includes, in particular, feature--label data distributions arising in distribution-specific learning. As a consequence, our mechanism also yields a sample-efficient (i.e., $O(\log T)$ samples) statistical query oracle in the distribution-specific setting. Moreover, although our algorithm is not based on differential privacy, it satisfies a relaxed privacy notion known as Predicate Singling Out (PSO) security (Cohen and Nissim, 2020). Our results thus reveal an inherent connection between adaptive data analysis and privacy beyond differential privacy.

Problem

Research questions and friction points this paper is trying to address.

Adaptive Data Analysis

Sample Complexity

Computational Efficiency

Dense Distributions

Statistical Inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Data Analysis

Sample Complexity

Dense Distributions

Predicate Singling Out

Statistical Query Oracle

🔎 Similar Papers

No similar papers found.

Authors to Follow