Predictive Performance of Statistical and Machine Learning Survival Models with Time-Dependent Covariates: An Evaluation
Pub. online: 26 March 2026
Type: Case Study, Application, And/or Practice Article
Open Access
Area: Machine Learning and Data Mining
Accepted
16 January 2026
16 January 2026
Published
26 March 2026
26 March 2026
Abstract
Time-to-event (TTE) endpoints are widely used in drug development and biomedical research. Traditional statistical models, for example the Cox regression model, have been used to predict TTE outcomes. Recent studies have also employed flexible machine learning (ML) methods, for example, tree models, to obtain superior prediction performance. In addition, post-baseline time-varying predictors have recently been reported to improve prediction using ML methods. In this study, we applied the Cox model and ML methods to predict the onset of TTE with both baseline and post-baseline predictors. We evaluated the predictive performance of these models using various metrics, including the time-dependent area under the receiver operating characteristic curve (AUC), the concordance index (C-index), and integrated Brier scores. We also used these metrics as criteria to guide the selection of predictors in the predictive models. Our findings indicate that the Cox model remains a robust choice, often comparable to ML methods in moderate sample sizes, provided the proportional hazards assumption holds. However, tree-based methods demonstrate superior performance in capturing complex, nonlinear interactions, albeit requiring larger sample sizes to stabilize predictions.
References
Agresti, A. (2010) Analysis of Ordinal Categorical Data 2nd ed. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ. https://doi.org/10.1002/9780470594001. MR2742515
Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control 19(6) 716–723. https://doi.org/10.1109/TAC.1974.1100705. MR0423716
Andersen, P. K. and Gill, R. D. (1982). Cox’s Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics 10(4) 1100–1120. https://doi.org/10.1214/aos/1176345976.
Bentéjac, C., Csörg, A. and Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review 54(3) 1937–1967. https://doi.org/10.1007/s10462-020-09896-5.
Billichová, M., Coan, L. J., Czanner, S., Kováová, M., Sharifian, F. and Czanner, G. (2024). Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment. PLOS ONE 19(1) 0297190. https://doi.org/10.1371/journal.pone.0297190.
Boldini, D., Grisoni, F., Kuhn, D., Friedrich, L. and Sieber, S. A. (2023). Practical guidelines for the use of gradient boosting for molecular property prediction. Journal of Cheminformatics 15(1) 73. https://doi.org/10.1186/s13321-023-00743-7.
Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science 22(4) 477–505. https://doi.org/10.1214/07-STS242. MR2420454
Chen, M. -H., Ibrahim, J. G. and Shao, Q. -M. (2009). Maximum likelihood inference for the Cox regression model with applications to missing covariates. Journal of multivariate analysis 100(9) 2018–2030. https://doi.org/10.1016/j.jmva.2009.03.013. MR2543083
Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) 34(2) 187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1972.tb00899.x.
Cuthbert, A. R., Giles, L. C., Glonek, G. et al. (2022). A comparison of survival models for prediction of eight-year revision risk following total knee and hip arthroplasty. BMC Medical Research Methodology 22 164. https://doi.org/10.1186/s12874-022-01644-3.
Fagbamigbe, A. F., Norrman, E., Bergh, C., Wennerholm, U. -B. and Petzold, M. (2021). Comparison of the performances of survival analysis regression models for analysis of conception modes and risk of type-1 diabetes among 1985–2015 Swedish birth cohort. PLOS ONE 16(6) 1–23. https://doi.org/10.1371/journal.pone.0253389.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics 29(5) 1189–1232. https://doi.org/10.1214/aos/1013203451. MR1873328
Frénay, B. and Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems 25(5) 845–869. https://doi.org/10.1109/TNNLS.2013.2292894.
Fu, W. and Simonoff, J. S. (2016). Survival trees for left-truncated and right-censored data, with application to time-varying covariate data. Biostatistics 18(2) 352–369. https://doi.org/10.1093/biostatistics/kxw047. https://academic.oup.com/biostatistics/article-pdf/18/2/352/11057459/kxw047.pdf. MR3825124
Fu, W., Simonoff, J. and Jing, W. (2021). LTRCtrees: Survival Trees to Fit Left-Truncated and Right-Censored and Interval-Censored Survival Data. R package version 1.1.1. https://CRAN.R-project.org/package=LTRCtrees.
Fu, Y., Jung, A. W., Torne, R. V. et al. (2020). Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nature Cancer. https://doi.org/10.1038/s43018-020-0085-8.
Habibi, D., Rafiei, M., Chehrei, A., Shayan, Z. and Tafaqodi, S. (2018). Comparison of Survival Models for Analyzing Prognostic Factors in Gastric Cancer Patients. Asian Pacific Journal of Cancer Prevention 19(3) 749–753. https://doi.org/10.22034/APJCP.2018.19.3.749.
Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1) 29–36. PMID: 7063747. https://doi.org/10.1148/radiology.143.1.7063747.
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, New York. https://doi.org/10.1007/978-0-387-84858-7. MR2722294
Heagerty, P. J. and packaging by Paramita Saha-Chaudhuri (2022). survivalROC: Time-Dependent ROC Curve Estimation from Censored Survival Data. R package version 1.0.3.1. https://CRAN.R-project.org/package=survivalROC.
Hemant Ishwaran, C. E. P. Eugene H Blackstone and Lauer, M. S. (2004). Relative Risk Forests for Exercise Heart Rate Recovery as a Predictor of Mortality. Journal of the American Statistical Association 99(467) 591–600. https://doi.org/10.1198/016214504000000638.
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. and Lauer, M. S. (2008). Random survival forests. The Annals of Applied Statistics 2(3) 841–860. https://doi.org/10.1214/08-AOAS169.
Ishwaran, H., Lauer, M. S., Blackstone, E. H., Lu, M. and Kogalur, U. B. (2021). randomForestSRC: random survival forests vignette. [accessed date]. http://randomforestsrc.org/articles/survival.html.
Karabey, U. and Tutkun, N. A. (2017). Model selection criterion in survival analysis. AIP Conference Proceedings 1863(1) 120003. https://doi.org/10.1063/1.4992296. https://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/1.4992296/13748246/120003_1_online.pdf.
Katzman, J. L., Shaham, U., Cloninger, A. et al. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology 18 24. https://doi.org/10.1186/s12874-018-0482-1.
Klein, J. P. and Moeschberger, M. L. (2003) Survival Analysis: Techniques for Censored and Truncated Data, 2nd ed. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6646-9.
Lee, C., Zame, W., Yoon, J. and van der Schaar, M. (2018). DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks. Proceedings of the AAAI Conference on Artificial Intelligence 32(1). https://doi.org/10.1609/aaai.v32i1.11842.
Little, R. J. and Rubin, D. B. (2019) Statistical analysis with missing data 793. John Wiley & Sons. https://doi.org/10.1002/9781119013563. MR1925014
Ozaki, R. and Ninomiya, Y. (2023). Information criteria for detecting change-points in the Cox proportional hazards model. Biometrics 79(4) 3050–3065. https://doi.org/10.1111/biom.13855. https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13855.
Park, S. Y., Park, J. E., Kim, H. and Park, S. H. (2021). Review of Statistical Methods for Evaluating the Performance of Survival or Other Time-to-Event Prediction Models (from Conventional to Deep Learning Approaches). Korean Journal of Radiology 22(10) 1697–1707. https://doi.org/10.3348/kjr.2021.0223.
Pölsterl, S., Sarasua, I., Gutiérrez-Becker, B. and Wachinger, C. (2020). A Wide and Deep Neural Network for Survival Analysis from Anatomical Shape and Tabular Clinical Data. In Machine Learning and Knowledge Discovery in Databases (P. Cellier and K. Driessens, eds.) 453–464. Springer International Publishing, Cham.
Probst, P., Bischl, B. and Boulesteix, A. -L. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research 20(53) 1–32. MR3948093
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics 6(2) 461–464. https://doi.org/10.1214/aos/1176344136.
Silvey, S. and Liu, J. (2024). Sample size requirements for popular classification algorithms in tabular clinical data: Empirical study. Journal of Medical Internet Research 26 60231. https://doi.org/10.2196/60231.
Spytek, M., Krzyziski, M., Langbein, S. H., Baniecki, H., Wright, M. N. and Biecek, P. (2023). survex: an R package for explaining machine learning survival models. arXiv preprint arXiv:2308.16113.
Therneau, T. (2023). A Package for Survival Analysis in R. R Core Team. Available at: https://cran.r-project.org/web/packages/survival/vignettes/survival.pdf.
Therneau, T. M. (2023). A Package for Survival Analysis in R. R package version 3.5-5. https://CRAN.R-project.org/package=survival.
Therneau, T. M. and Grambsch, P. M. (2000) Modeling Survival Data: Extending the Cox Model. Springer, New York. https://doi.org/10.1007/978-1-4757-3294-8. MR1774977
Torsten Hothorn, K. H. and Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics 15(3) 651–674. https://doi.org/10.1198/106186006X133933.
Wang, S., Zhuang, J., Zheng, J., Fan, H., Kong, J. and Zhan, J. (2021). Application of Bayesian Hyperparameter Optimized Random Forest and XGBoost Model for Landslide Susceptibility Mapping. Frontiers in Earth Science 9 712240. https://doi.org/10.3389/feart.2021.712240.
Wang, W., Chen, K. and Yan, J. (2021). intsurv: Integrative Survival Models. R package version 0.2.2. https://github.com/wenjie2wang/intsurv.
White, I. R. and Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine 28(15) 1982–1998. https://doi.org/10.1002/sim.3618. https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.3618.
Yao, W., Frydman, H., Larocque, D. and Simonoff, J. S. (2022). Ensemble Methods for Survival Function Estimation with Time-Varying Covariates. arXiv. https://doi.org/10.48550/arXiv.2006.00567.
Yao, W., Frydman, H., Larocque, D. and Simonoff, J. S. (2022). Ensemble methods for survival function estimation with time-varying covariates. Statistical Methods in Medical Research 31(11) 2217–2236. PMID: 35895510. https://doi.org/10.1177/09622802221111549.
Yao, W., Frydman, H., Larocque, D. and Simonoff, J. S. (2023). LTRCforests: Ensemble Methods for Survival Data with Time-Varying Covariates. R package version 0.7.0. https://CRAN.R-project.org/package=LTRCforests.
Zhou, H., Cheng, X., Wang, S., Zou, Y. and Wang, H. (2022). SurvMetrics: Predictive Evaluation Metrics in Survival Analysis. R package version 0.5.0. https://CRAN.R-project.org/package=SurvMetrics.