๐ค AI Summary
This study addresses the challenges in testing HardyโWeinberg equilibrium (HWE) on the X chromosome, where sex-specific genotype structures and sex-differential minor allele frequencies (sdMAF) lead to ambiguous null hypotheses and inconsistent methodologies. The authors propose a unified statistical framework based on allelic regression that reframes HWE testing as an assessment of allelic dependence. This approach explicitly clarifies the null hypotheses, degrees of freedom, and sensitivity to sdMAF across different tests, while naturally accommodating covariate adjustment and correction for population structure. The framework subsumes existing chi-square methods and achieves coherent HWE inference between autosomes and the X chromosome. Simulations and analyses of high-coverage data from the 1000 Genomes Project demonstrate that the proposed method properly controls Type I error and avoids the misleading conclusions often produced by conventional approaches in the presence of sdMAF.
๐ Abstract
Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X chromosome is more complex due to sex-specific genotype structures and potential sex differences in minor allele frequency (sdMAF). Existing tests differ in their assumptions about sdMAF and male sample inclusion, often leading to distinct but poorly characterized null hypotheses.
We develop a general statistical framework for HWE inference using the robust allele-based regression model. By formulating HWE testing as an assessment of allele-level dependence, the framework directly parameterizes Hardy-Weinberg disequilibrium, unifies existing Pearson chi-square-based tests under explicit modeling assumptions, and clarifies their null hypotheses, degrees of freedom, and sensitivity to sdMAF. The framework also accommodates covariate and population-structure adjustment within a unified regression-based formulation.
The proposed framework provides robust, interpretable, and flexible inference, establishing a unified statistical foundation for HWE testing across autosomal and X-chromosomal regions. Simulation studies and analysis of high-coverage 1000 Genomes Project data demonstrate that commonly used X-chromosome tests can exhibit inflated type I error or misleading inference when sdMAF is present.