🤖 AI Summary
To address the infeasibility of cohort designs under high attrition rates and the low estimation precision of difference-in-differences (DID) with multi-period repeated cross-sectional (RCS) data, this paper proposes the DISC (Different Individuals, Same Clusters) sampling design: independent draws of distinct individuals from identical clusters across periods—thereby integrating advantages of both cohort and cross-sectional approaches. We theoretically establish that, under cluster-level random effects, DISC substantially improves DID estimation efficiency. Using a potential outcomes framework, variance decomposition, and intracluster correlation (ICC) analysis, we formally validate its statistical inference validity. Simulation results (n = 1,000) show that DISC reduces DID estimator variance by 56%, 74%, and 86% when ICC = 0.05, 0.1, and 0.2, respectively—yielding up to a 7.3-fold precision gain over conventional RCS. DISC thus provides an efficient, implementable causal inference paradigm for real-world settings where individual tracking is impractical, such as large-scale health surveys and policy evaluations.
📝 Abstract
We describe the DISC (Different Individuals, Same Clusters) design, a sampling scheme that can improve the precision of difference-in-differences (DID) estimators in settings involving repeated sampling of a population at multiple time points. Although cohort designs typically lead to more efficient DID estimators relative to repeated cross-sectional (RCS) designs, they are often impractical in practice due to high rates of loss-to-follow-up, individuals leaving the risk set, or other reasons. The DISC design represents a hybrid between a cohort sampling design and a RCS sampling design, an alternative strategy in which the researcher takes a single sample of clusters, but then takes different cross-sectional samples of individuals within each cluster at two or more time points. We show that the DISC design can yield DID estimators with much higher precision relative to a RCS design, particularly if random cluster effects are present in the data-generating mechanism. For example, for a design in which 40 clusters and 25 individuals per cluster are sampled (for a total sample size of n=1,000), the variance of a commonly-used DID treatment effect estimator is 2.3 times higher in the RCS design for an intraclass correlation coefficient (ICC) of 0.05, 3.8 times higher for an ICC of 0.1, and 7.3 times higher for an ICC of 0.2.