🤖 AI Summary
This work addresses a fundamental trade-off in adaptive data analysis between computational efficiency and sample complexity: computationally efficient algorithms are typically suboptimal in sample usage, whereas sample-optimal methods are often computationally infeasible. Focusing on settings where the data distribution is dense relative to a known prior, the paper proposes an adaptive query mechanism that eschews differential privacy yet satisfies predicate single-out (PSO) security. Built upon a distribution-specific learning framework, the method achieves, for the first time in the dense regime, the optimal $O(\log T)$ sample complexity while remaining computationally efficient—significantly improving upon the $O(\sqrt{T})$ sample requirements of conventional efficient approaches. This result also uncovers an intrinsic connection between adaptive data analysis and PSO security.
📝 Abstract
Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid statistical inference. Adaptive Data Analysis (ADA) mechanisms address this challenge; however, there is a fundamental tension between computational efficiency and sample complexity. For $T$ rounds of adaptive analysis, computationally efficient algorithms typically incur suboptimal $O(\sqrt{T})$ sample complexity, whereas statistically optimal $O(\log T)$ algorithms are computationally intractable under standard cryptographic assumptions. In this work, we shed light on this trade-off by identifying a natural class of data distributions under which both computational efficiency and optimal sample complexity are achievable. We propose a computationally efficient ADA mechanism that attains optimal $O(\log T)$ sample complexity when the data distribution is dense with respect to a known prior. This setting includes, in particular, feature--label data distributions arising in distribution-specific learning. As a consequence, our mechanism also yields a sample-efficient (i.e., $O(\log T)$ samples) statistical query oracle in the distribution-specific setting. Moreover, although our algorithm is not based on differential privacy, it satisfies a relaxed privacy notion known as Predicate Singling Out (PSO) security (Cohen and Nissim, 2020). Our results thus reveal an inherent connection between adaptive data analysis and privacy beyond differential privacy.