Zero-disparity Distribution Synthesis: Fast Exact Calculation of Chi-Squared Statistic Distribution for Discrete Uniform Histograms

📅 2025-06-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pearson’s chi-squared test commonly relies on the continuous χ² distribution to approximate the discrete sampling distribution of the test statistic under uniformity, but this approximation incurs substantial errors—especially under low expected frequencies or in tail probability calculations. Method: We propose the first efficient algorithm for computing the exact distribution of the chi-squared statistic under the discrete uniform null hypothesis, leveraging dynamic programming and zero-difference distribution composition to achieve feasible computational complexity. Contribution/Results: Our method enables the first systematic quantification of systematic bias in the χ² approximation within the tail region, revealing significant p-value inflation (and consequent Type I error inflation) when expected frequencies fall below 5. Empirical evaluation confirms that approximation errors can exceed an order of magnitude. These findings correct longstanding statistical practice assumptions. An open-source implementation supports high-precision hypothesis testing and critical value calibration.

Technology Category

Application Category

📝 Abstract
Pearson's chi-squared test is widely used to assess the uniformity of discrete histograms, typically relying on a continuous chi-squared distribution to approximate the test statistic, since computing the exact distribution is computationally too costly. While effective in many cases, this approximation allegedly fails when expected bin counts are low or tail probabilities are needed. Here, Zero-disparity Distribution Synthesis is presented, a fast dynamic programming approach for computing the exact distribution, enabling detailed analysis of approximation errors. The results dispel some existing misunderstandings and also reveal subtle, but significant pitfalls in approximation that are only apparent with exact values. The Python source code is available at https://github.com/DiscreteTotalVariation/ChiSquared.
Problem

Research questions and friction points this paper is trying to address.

Exact calculation of chi-squared statistic distribution
Address approximation errors in low-count bins
Fast dynamic programming for discrete uniform histograms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast dynamic programming for exact distribution
Zero-disparity Distribution Synthesis technique
Detailed analysis of approximation errors
🔎 Similar Papers
No similar papers found.
Nikola Banić
Nikola Banić
University of Zagreb, Faculty of Electrical Engineering and Computing
image processingcolor constancytone mapping
N
Neven Elezović
Department of Applied Mathematics, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia