Mapping Multivariate Phenotypes in the Presence of Missing Observations for Family-Based Data

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Missing multivariate phenotypes—continuous, count, and categorical—in family-based genetic studies hinder robust association analysis. Method: We propose a conditional-distribution-based, phenotype-specific multiple imputation framework: conditional normal, Poisson, and logistic regression models are employed for biologically interpretable imputation; subsequent association testing uses a distribution-free transmission disequilibrium test (TDT)-type statistic. Contribution/Results: This is the first unified, model-adapted imputation–testing pipeline explicitly designed for missing multivariate phenotypes in family data, overcoming limitations of conventional methods that require complete data or impose strong distributional assumptions. Simulation studies demonstrate that the proposed approach substantially restores statistical power—approaching the ideal level attainable under complete observation—while rigorously controlling Type I error rates across diverse genetic architectures.

Technology Category

Application Category

📝 Abstract
Clinical end-point traits are often characterized by quantitative or qualitative precursors and it has been argued that it may be statistically a more powerful strategy to analyze these precursor traits to decipher the genetic architecture of the underlying complex end-point trait. While association methods for both quantitative and qualitative traits have been extensively developed to analyze population level data, development of such methods are of current research interest for family-level data that pose additional challenges of incorporation of correlation of trait values within a family. Haldar and Ghosh (2015) developed a test which is Statistical equivalent of the classical TDT for quantitative traits and multivariate phenotypes. The model does not require a priori assumptions on the probability distributions of the phenotypes. However, it may often arise in practice that data on the phenotype of interest may not be available for all offspring in a nuclear family. In this study, we explore methodologies to estimate missing phenotypes conditioned on the available ones and carry out the transmission-based test for association on the 'complete' data. We consider three types of phenotypes: continuous, count and categorical. For a missing continuous phenotype, the trait value is estimated using a conditional normal model. For a missing count phenotypes, the trait value is estimated using a conditional Poisson model. For a missing categorical phenotype, the risk of the phenotype status is estimated using a conditional logistic model. We shall carry out simulations under a wide spectrum of genetic models and assess the effect of the proposed imputation strategy on the power of the association test vis-`a-vis the the ideal situation with no missing data.
Problem

Research questions and friction points this paper is trying to address.

Estimating missing phenotypes in family-based genetic studies
Developing association tests for multivariate phenotypes with missing data
Comparing imputation methods for continuous, count, and categorical traits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates missing phenotypes using conditional models
Handles continuous, count, and categorical phenotypes
Assesses imputation impact on association test power
🔎 Similar Papers
No similar papers found.
Soumya Sahu
Soumya Sahu
University of Illinois at Chicago
biostatistics
S
Saurabh Ghosh
Human Genetics Unit, Indian Statistical Institute, Kolkata