The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. To appear
  3. Predictive Performance of Statistical an ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Predictive Performance of Statistical and Machine Learning Survival Models with Time-Dependent Covariates: An Evaluation
Zhaohua Lu   Philip He  

Authors

 
Placeholder
https://doi.org/10.51387/26-NEJSDS98
Pub. online: 26 March 2026      Type: Case Study, Application, And/or Practice Article      Open accessOpen Access
Area: Machine Learning and Data Mining

Accepted
16 January 2026
Published
26 March 2026

Abstract

Time-to-event (TTE) endpoints are widely used in drug development and biomedical research. Traditional statistical models, for example the Cox regression model, have been used to predict TTE outcomes. Recent studies have also employed flexible machine learning (ML) methods, for example, tree models, to obtain superior prediction performance. In addition, post-baseline time-varying predictors have recently been reported to improve prediction using ML methods. In this study, we applied the Cox model and ML methods to predict the onset of TTE with both baseline and post-baseline predictors. We evaluated the predictive performance of these models using various metrics, including the time-dependent area under the receiver operating characteristic curve (AUC), the concordance index (C-index), and integrated Brier scores. We also used these metrics as criteria to guide the selection of predictors in the predictive models. Our findings indicate that the Cox model remains a robust choice, often comparable to ML methods in moderate sample sizes, provided the proportional hazards assumption holds. However, tree-based methods demonstrate superior performance in capturing complex, nonlinear interactions, albeit requiring larger sample sizes to stabilize predictions.

References

[1] 
Agresti, A. (2010) Analysis of Ordinal Categorical Data 2nd ed. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ. https://doi.org/10.1002/9780470594001. MR2742515
[2] 
Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control 19(6) 716–723. https://doi.org/10.1109/TAC.1974.1100705. MR0423716
[3] 
Andersen, P. K. and Gill, R. D. (1982). Cox’s Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics 10(4) 1100–1120. https://doi.org/10.1214/aos/1176345976.
[4] 
Bentéjac, C., Csörg, A. and Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review 54(3) 1937–1967. https://doi.org/10.1007/s10462-020-09896-5.
[5] 
Billichová, M., Coan, L. J., Czanner, S., Kováová, M., Sharifian, F. and Czanner, G. (2024). Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment. PLOS ONE 19(1) 0297190. https://doi.org/10.1371/journal.pone.0297190.
[6] 
Boldini, D., Grisoni, F., Kuhn, D., Friedrich, L. and Sieber, S. A. (2023). Practical guidelines for the use of gradient boosting for molecular property prediction. Journal of Cheminformatics 15(1) 73. https://doi.org/10.1186/s13321-023-00743-7.
[7] 
Breiman, L. (2001). Random forests. Machine Learning 45(1) 5–32.
[8] 
Breslow, N. E. (1972). Contribution to the Discussion of the Paper by D. R. Cox. Journal of the Royal Statistical Society: Series B (Methodological) 34 187–220.
[9] 
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly weather review 78(1) 1–3.
[10] 
Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science 22(4) 477–505. https://doi.org/10.1214/07-STS242. MR2420454
[11] 
Chen, C. -Y. and Chang, Y. -W. (2024). Missing data imputation using classification and regression trees. PeerJ Computer Science 10 2119.
[12] 
Chen, M. -H., Ibrahim, J. G. and Shao, Q. -M. (2009). Maximum likelihood inference for the Cox regression model with applications to missing covariates. Journal of multivariate analysis 100(9) 2018–2030. https://doi.org/10.1016/j.jmva.2009.03.013. MR2543083
[13] 
Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794.
[14] 
Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) 34(2) 187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1972.tb00899.x.
[15] 
Cuthbert, A. R., Giles, L. C., Glonek, G. et al. (2022). A comparison of survival models for prediction of eight-year revision risk following total knee and hip arthroplasty. BMC Medical Research Methodology 22 164. https://doi.org/10.1186/s12874-022-01644-3.
[16] 
Dietterich, T. G. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40(2) 139–157.
[17] 
Fagbamigbe, A. F., Norrman, E., Bergh, C., Wennerholm, U. -B. and Petzold, M. (2021). Comparison of the performances of survival analysis regression models for analysis of conception modes and risk of type-1 diabetes among 1985–2015 Swedish birth cohort. PLOS ONE 16(6) 1–23. https://doi.org/10.1371/journal.pone.0253389.
[18] 
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics 29(5) 1189–1232. https://doi.org/10.1214/aos/1013203451. MR1873328
[19] 
Frénay, B. and Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems 25(5) 845–869. https://doi.org/10.1109/TNNLS.2013.2292894.
[20] 
Fu, W. and Simonoff, J. S. (2016). Survival trees for left-truncated and right-censored data, with application to time-varying covariate data. Biostatistics 18(2) 352–369. https://doi.org/10.1093/biostatistics/kxw047. https://academic.oup.com/biostatistics/article-pdf/18/2/352/11057459/kxw047.pdf. MR3825124
[21] 
Fu, W., Simonoff, J. and Jing, W. (2021). LTRCtrees: Survival Trees to Fit Left-Truncated and Right-Censored and Interval-Censored Survival Data. R package version 1.1.1. https://CRAN.R-project.org/package=LTRCtrees.
[22] 
Fu, Y., Jung, A. W., Torne, R. V. et al. (2020). Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nature Cancer. https://doi.org/10.1038/s43018-020-0085-8.
[23] 
Graf, E., Schmoor, C., Sauerbrei, W. and Schumacher, M. (1999). Assessment and comparison of prognostic classification schemes for survival data. Statistics in medicine 18(17-18) 2529–2545.
[24] 
Habibi, D., Rafiei, M., Chehrei, A., Shayan, Z. and Tafaqodi, S. (2018). Comparison of Survival Models for Analyzing Prognostic Factors in Gastric Cancer Patients. Asian Pacific Journal of Cancer Prevention 19(3) 749–753. https://doi.org/10.22034/APJCP.2018.19.3.749.
[25] 
Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1) 29–36. PMID: 7063747. https://doi.org/10.1148/radiology.143.1.7063747.
[26] 
Harrell Jr, F. E., Lee, K. L. and Mark, D. B. (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine 15(4) 361–387.
[27] 
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, New York. https://doi.org/10.1007/978-0-387-84858-7. MR2722294
[28] 
Heagerty, P. J. and packaging by Paramita Saha-Chaudhuri (2022). survivalROC: Time-Dependent ROC Curve Estimation from Censored Survival Data. R package version 1.0.3.1. https://CRAN.R-project.org/package=survivalROC.
[29] 
Heagerty, P. J., Lumley, T. and Pepe, M. S. (2000). Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56(2) 337–344.
[30] 
Hemant Ishwaran, C. E. P. Eugene H Blackstone and Lauer, M. S. (2004). Relative Risk Forests for Exercise Heart Rate Recovery as a Predictor of Mortality. Journal of the American Statistical Association 99(467) 591–600. https://doi.org/10.1198/016214504000000638.
[31] 
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. and Lauer, M. S. (2008). Random survival forests. The Annals of Applied Statistics 2(3) 841–860. https://doi.org/10.1214/08-AOAS169.
[32] 
Ishwaran, H., Lauer, M. S., Blackstone, E. H., Lu, M. and Kogalur, U. B. (2021). randomForestSRC: random survival forests vignette. [accessed date]. http://randomforestsrc.org/articles/survival.html.
[33] 
Karabey, U. and Tutkun, N. A. (2017). Model selection criterion in survival analysis. AIP Conference Proceedings 1863(1) 120003. https://doi.org/10.1063/1.4992296. https://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/1.4992296/13748246/120003_1_online.pdf.
[34] 
Katzman, J. L., Shaham, U., Cloninger, A. et al. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology 18 24. https://doi.org/10.1186/s12874-018-0482-1.
[35] 
Klein, J. P. and Moeschberger, M. L. (2003) Survival Analysis: Techniques for Censored and Truncated Data, 2nd ed. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6646-9.
[36] 
Lee, C., Zame, W., Yoon, J. and van der Schaar, M. (2018). DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks. Proceedings of the AAAI Conference on Artificial Intelligence 32(1). https://doi.org/10.1609/aaai.v32i1.11842.
[37] 
Little, R. J. and Rubin, D. B. (2019) Statistical analysis with missing data 793. John Wiley & Sons. https://doi.org/10.1002/9781119013563. MR1925014
[38] 
Mantel, N. et al. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 50(3) 163–170.
[39] 
Ozaki, R. and Ninomiya, Y. (2023). Information criteria for detecting change-points in the Cox proportional hazards model. Biometrics 79(4) 3050–3065. https://doi.org/10.1111/biom.13855. https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13855.
[40] 
Park, S. Y., Park, J. E., Kim, H. and Park, S. H. (2021). Review of Statistical Methods for Evaluating the Performance of Survival or Other Time-to-Event Prediction Models (from Conventional to Deep Learning Approaches). Korean Journal of Radiology 22(10) 1697–1707. https://doi.org/10.3348/kjr.2021.0223.
[41] 
Pölsterl, S., Sarasua, I., Gutiérrez-Becker, B. and Wachinger, C. (2020). A Wide and Deep Neural Network for Survival Analysis from Anatomical Shape and Tabular Clinical Data. In Machine Learning and Knowledge Discovery in Databases (P. Cellier and K. Driessens, eds.) 453–464. Springer International Publishing, Cham.
[42] 
Probst, P., Bischl, B. and Boulesteix, A. -L. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research 20(53) 1–32. MR3948093
[43] 
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics 6(2) 461–464. https://doi.org/10.1214/aos/1176344136.
[44] 
Silvey, S. and Liu, J. (2024). Sample size requirements for popular classification algorithms in tabular clinical data: Empirical study. Journal of Medical Internet Research 26 60231. https://doi.org/10.2196/60231.
[45] 
Spooner, A., Chen, E., Sowmya, A., Sachdev, P., Kochan, N. A., Trollor, J. and Brodaty, H. (2020). A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Scientific Reports 10(1) 20410.
[46] 
Spytek, M., Krzyziski, M., Langbein, S. H., Baniecki, H., Wright, M. N. and Biecek, P. (2023). survex: an R package for explaining machine learning survival models. arXiv preprint arXiv:2308.16113.
[47] 
Therneau, T. (2023). A Package for Survival Analysis in R. R Core Team. Available at: https://cran.r-project.org/web/packages/survival/vignettes/survival.pdf.
[48] 
Therneau, T., Crowson, C. and Atkinson, E. (2024). Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model.
[49] 
Therneau, T. M. (2023). A Package for Survival Analysis in R. R package version 3.5-5. https://CRAN.R-project.org/package=survival.
[50] 
Therneau, T. M. and Grambsch, P. M. (2000) Modeling Survival Data: Extending the Cox Model. Springer, New York. https://doi.org/10.1007/978-1-4757-3294-8. MR1774977
[51] 
Torsten Hothorn, K. H. and Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics 15(3) 651–674. https://doi.org/10.1198/106186006X133933.
[52] 
Wang, S., Zhuang, J., Zheng, J., Fan, H., Kong, J. and Zhan, J. (2021). Application of Bayesian Hyperparameter Optimized Random Forest and XGBoost Model for Landslide Susceptibility Mapping. Frontiers in Earth Science 9 712240. https://doi.org/10.3389/feart.2021.712240.
[53] 
Wang, W., Chen, K. and Yan, J. (2021). intsurv: Integrative Survival Models. R package version 0.2.2. https://github.com/wenjie2wang/intsurv.
[54] 
White, I. R. and Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine 28(15) 1982–1998. https://doi.org/10.1002/sim.3618. https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.3618.
[55] 
Yao, W., Frydman, H., Larocque, D. and Simonoff, J. S. (2022). Ensemble Methods for Survival Function Estimation with Time-Varying Covariates. arXiv. https://doi.org/10.48550/arXiv.2006.00567.
[56] 
Yao, W., Frydman, H., Larocque, D. and Simonoff, J. S. (2022). Ensemble methods for survival function estimation with time-varying covariates. Statistical Methods in Medical Research 31(11) 2217–2236. PMID: 35895510. https://doi.org/10.1177/09622802221111549.
[57] 
Yao, W., Frydman, H., Larocque, D. and Simonoff, J. S. (2023). LTRCforests: Ensemble Methods for Survival Data with Time-Varying Covariates. R package version 0.7.0. https://CRAN.R-project.org/package=LTRCforests.
[58] 
Zhou, H., Cheng, X., Wang, S., Zou, Y. and Wang, H. (2022). SurvMetrics: Predictive Evaluation Metrics in Survival Analysis. R package version 0.5.0. https://CRAN.R-project.org/package=SurvMetrics.

Full article PDF XML
Full article PDF XML

Copyright
© 2026 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Time-dependent predictors Cox proportional hazards model Survival random forest Survival tree Concordance index Brier score Time-dependent AUC

Metrics
since December 2021
57

Article info
views

21

Full article
views

32

PDF
downloads

23

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy