Time-to-event (TTE) endpoints are widely used in drug development and biomedical research. Traditional statistical models, for example the Cox regression model, have been used to predict TTE outcomes. Recent studies have also employed flexible machine learning (ML) methods, for example, tree models, to obtain superior prediction performance. In addition, post-baseline time-varying predictors have recently been reported to improve prediction using ML methods. In this study, we applied the Cox model and ML methods to predict the onset of TTE with both baseline and post-baseline predictors. We evaluated the predictive performance of these models using various metrics, including the time-dependent area under the receiver operating characteristic curve (AUC), the concordance index (C-index), and integrated Brier scores. We also used these metrics as criteria to guide the selection of predictors in the predictive models. Our findings indicate that the Cox model remains a robust choice, often comparable to ML methods in moderate sample sizes, provided the proportional hazards assumption holds. However, tree-based methods demonstrate superior performance in capturing complex, nonlinear interactions, albeit requiring larger sample sizes to stabilize predictions.
Clinical trial design for rare diseases can be challenging due to limited data, heterogeneous clinical manifestations and progression, and a frequent lack of adequate knowledge about the disease. Multiple endpoints are usually used to collectively assess the effectiveness of the investigational drug on multiple aspects of the disease. Here we propose an adaptive design based on the promising zone framework, allowing for sample size re-estimation (SSR) using interim data for a clinical trial involving multiple endpoints. The proposed SSR procedure incorporates two global tests: the ordinary least squares (OLS) test and the nonparametric permutation test. We consider two SSR approaches: one is based on power (SSR-Power) and the other on conditional power (SSR-CP). Simulation results show that the adaptive design achieves type I error control and satisfactory power. Compared with the permutation test, the OLS test has improved type I error control when the sample size is small and the timing of the interim analysis is early; while the permutation test achieves slightly higher power in most scenarios. Regarding the SSR methods, SSR-CP consistently achieves higher power than SSR-Power but often requires a larger sample size and more frequently reaches the maximum allowable sample size. The proposed design is particularly useful when the trial has a small initial sample size and has opportunity to adjust the sample size at an interim analysis to achieve adequate power.