🤖 AI Summary
Bayesian response-adaptive randomization (BRAR) designs lack theoretical guidance for selecting the burn-in period length, leading to suboptimal trade-offs among Type I error control, statistical power, and patient benefit. Method: We systematically investigate the non-monotonic impact of burn-in length on these operating characteristics in two-arm, binary-outcome clinical trials. We develop a novel evaluation framework grounded in exact probability calculations and conditional hypothesis testing, and compare asymptotic and calibrated tests. Contribution/Results: We demonstrate that conditional exact tests achieve optimal statistical performance under small-to-moderate burn-in lengths. Our analysis provides principled redesign recommendations for the ARREST trial—substantially improving power while strengthening Type I error control. Crucially, we show that no universally optimal burn-in length exists; instead, its selection must be tailored to the trial’s primary objective—whether prioritizing stringent error control or maximizing patient benefit—through explicit trade-off calibration.
📝 Abstract
Response-adaptive (RA) trials offer the potential to enhance participant benefit but also complicate valid statistical analysis and potentially lead to a higher proportion of participants receiving an inferior treatment. A common approach to mitigate these disadvantages is to introduce a fixed non-adaptive randomization stage at the start of the RA design, known as the burn-in period. Currently, investigations and guidance on the effect of the burn-in length are scarce. To this end, this paper provides an exact evaluation approach to investigate how the burn-in length impacts the statistical properties of two-arm binary RA designs. We show that (1) for commonly used calibration and asymptotic tests an increase in the burn-in length reduces type I error rate inflation but does not lead to strict type I error rate control, necessitating exact tests; (2) the burn-in length substantially influences the power and participant benefit, and these measures are often not maximized at the maximum or minimum possible burn-in length; (3) the conditional exact test conditioning on total successes provides the highest average and minimum power for both small and moderate burn-in lengths compared to other tests. Using our exact analysis method, we re-design the ARREST trial to improve its statistical properties.