🤖 AI Summary
This work proposes a flexible and rigorous monitoring framework for two-arm randomized controlled trials that addresses the challenge of Type I error control in adaptive designs with frequent interim analyses and data-dependent adaptations. Built upon E-values and E-processes, the approach enables valid inference under composite null hypotheses and supports futility monitoring, seamlessly integrating group sequential and Bayesian perspectives. By constructing E-processes via betting martingales and incorporating calibration strategies, multiplicity adjustments, and hybrid design elements, the method guarantees strict Type I error control without requiring pre-specified analysis times. The framework is implemented in the open-source R package evalinger. Numerical experiments demonstrate that, under continuous monitoring, the proposed method not only maintains exact Type I error control but also achieves higher statistical power compared to conventional group sequential approaches.
📝 Abstract
Adaptive clinical trials rely on interim analyses, flexible stopping, and data-dependent design modifications that complicate statistical guarantees when fixed-horizon test statistics are repeatedly inspected or reused after adaptations. E-values and e-processes provide anytime-valid tests and confidence sequences that remain valid under optional stopping and optional continuation without requiring a prespecified monitoring schedule. This paper is a methodology guide for practitioners. We develop the betting-martingale construction of e-processes for two-arm randomized controlled trials, show how e-values naturally handle composite null hypotheses and support futility monitoring, and provide guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows. A numerical study compares five monitoring rules -- naive and calibrated versions of frequentist, Bayesian, and e-value approaches -- in a two-arm binary-endpoint trial. Naive repeated testing and naive posterior thresholds inflate Type I error substantially under frequent interim looks. Among the valid methods, the calibrated group sequential rule achieves the highest power, the e-value rule provides robust anytime-valid control with moderate power, and the calibrated Bayesian rule is the most conservative. Extended simulations show that the power gap between group sequential and e-value methods depends on the monitoring schedule and reverses under continuous monitoring. The methodology, including futility monitoring, platform trial multiplicity control, and hybrid strategies combining e-values with established methods, is implemented in the open-source R package `evalinger` and situated within the regulatory framework of the January 2026 FDA draft guidance on Bayesian methodology.