🤖 AI Summary
This study addresses a critical limitation in conventional two-stage approaches that link individual-level distributional characteristics—such as variability and skewness—to downstream outcomes, which ignore estimation error in the first stage and consequently yield biased estimates and inflated Type I error rates. To overcome this, the authors propose the Distributional Feature Latent Variable Model (DFLVM), which, for the first time, integrates distributional features into a latent variable framework. DFLVM captures between-individual heterogeneity through random intercepts and jointly models both the distributional features and their effects on outcomes within a single-step maximum likelihood estimation procedure. This unified approach circumvents the inherent bias of two-stage methods. Simulation studies and empirical analyses demonstrate that DFLVM substantially reduces estimation bias and false positive rates while enhancing inferential accuracy.
📝 Abstract
Analyzing the mean response of study subjects in psychological research is a standard, well-justified practice. However, theoretical arguments and empirical evidence also suggest that there is value in investigating other aspects of the distribution of such responses, such as their variability or skewness.
A particular challenge that practitioners face is statistical modeling of associations between distributional features and other outcomes of interest. The most common approach is to perform estimation in two steps: distributional features are estimated first, and then those estimates are used as predictors for the relevant outcomes. Such an approach is most amenable to implementation in standard statistical software, but it ignores estimation error and can therefore lead to biased estimates and increased error rates.
We introduce Distributional Feature Latent Variable Models (DFLVM), a general framework that represents between-person difference in distributional features as random intercepts. These intercepts can be simultaneously used as predictors for downstream outcomes and their associations estimated in a single estimation step. We compare the performance of our approach against two-step procedures in a simulation study and through a re-analysis of a real dataset.