🤖 AI Summary
Computing the first extinction time during resampling exhibits exponential complexity—O(2^M)—with respect to the number of states M, hindering scalability in population genetics and self-training dynamics.
Method: We propose an analytical framework based on a square-root diffusion approximation: modeling polynomial updates as zero-drift stochastic processes and deriving a closed-form solution for the first extinction time.
Contribution/Results: Our approach reduces computational complexity from exponential to linear—O(M)—enabling scalable inference. The theoretical mean exactly matches that of the Wright–Fisher model, and Monte Carlo simulations confirm both high accuracy and efficiency. Furthermore, the derived analytical law successfully predicts the critical onset of model collapse in self-training, offering a unified, interpretable theoretical lens for resampling dynamics. This framework bridges stochastic population modeling and modern machine learning phenomena, yielding both rigorous guarantees and practical utility.
📝 Abstract
Extinction times in resampling processes are fundamental yet often intractable, as previous formulas scale as $2^M$ with the number of states $M$ present in the initial probability distribution. We solve this by treating multinomial updates as independent square-root diffusions of zero drift, yielding a closed-form law for the first-extinction time. We prove that the mean coincides exactly with the Wright-Fisher result of Baxter et al., thereby replacing exponential-cost evaluations with a linear-cost expression, and we validate this result through extensive simulations. Finally, we demonstrate predictive power for model collapse in a simple self-training setup: the onset of collapse coincides with the resampling-driven first-extinction time computed from the model's initial stationary distribution. These results hint to a unified view of resampling extinction dynamics.