Efficacy Analysis in Clinical Trials: A Comprehensive Review of Statistical and Machine Learning Approaches
Pub. online: 1 April 2026
Type: Timely Review Article
Open Access
Area: Biomedical Research
1
These authors contributed equally as first authors.
Accepted
26 February 2026
26 February 2026
Published
1 April 2026
1 April 2026
Abstract
Efficacy testing is a cornerstone of clinical trials, ensuring that medical interventions achieve their intended therapeutic effects. Over the decades, a wide range of statistical methodologies have been developed to address the complexities of clinical trial data, including parametric, nonparametric, Bayesian, and machine learning approaches. Parametric methods, such as t-tests, ANOVA, and LMMs, have traditionally been the foundation of efficacy testing due to their efficiency under well-defined assumptions. Nonparametric techniques, including the Friedman test, Brunner-Munzel test, and modern extensions like nparLD, have emerged as robust alternatives, particularly for skewed, ordinal, or non-normal data. Bayesian methodologies have enabled the incorporation of prior information and uncertainty quantification, while machine learning techniques, such as deep learning and reinforcement learning, are revolutionizing trial designs and outcome predictions. Despite these advancements, significant gaps remain, including challenges in handling high-dimensional data, missingness, and ensuring equitable efficacy testing across diverse populations. This review provides a comprehensive overview of these statistical methods, highlighting their applications, strengths, limitations, and future directions. By bridging traditional statistical frameworks with modern computational techniques, the field can continue to advance toward more reliable and personalized clinical trial methodologies.
References
Adcock, M., Fankhauser, M., Post, J., Lutz, K., Zizlsperger, L., Luft, A. R., Guimarães, V., Schättin, A. and de Bruin, E. D. (2020). Effects of an in-home multicomponent exergame training on physical functions, cognition, and brain volume of older adults: a randomized controlled trial. Frontiers in medicine 6 321.
Akritas, M. G., Arnold, S. F. and Brunner, E. (1997). Nonparametric hypotheses and rank statistics for unbalanced factorial designs. Journal of the American Statistical Association 92(437) 258–265. https://doi.org/10.2307/2291470. MR1436114
Berger, J. O. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence. Journal of the American statistical Association 82(397) 112–122. MR0883340
Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association 112(518) 859–877. https://doi.org/10.1080/01621459.2017.1285773. MR3671776
Boulware, D. R., Pullen, M. F., Bangdiwala, A. S., Pastick, K. A., Lofgren, S. M., Okafor, E. C., Skipper, C. P., Nascene, A. A., Nicol, M. R., Abassi, M. et al. (2020). A randomized trial of hydroxychloroquine as postexposure prophylaxis for Covid-19. New England journal of medicine 383(6) 517–525.
Brunner, E. and Munzel, U. (2000). The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample approximation. Biometrics 56(4) 1173–1182. https://doi.org/10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U. MR1744561
Camino, R. D., Hammerschmidt, C. A. and State, R. (2019). Improving missing data imputation with deep generative models. arXiv preprint arXiv:1902.10666.
Daniels, M. and Hogan, J. (2008) Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. CRC Press. https://doi.org/10.1201/9781420011180. MR2459796
Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association 94(446) 496–509. https://doi.org/10.2307/2670170. MR1702320
Fisher, R. A. (1925) Statistical methods for research workers. Oliver and Boyd. MR0346954
Ghosh, D. and Luo, S. (2025). A non-parametric U-statistic testing approach for multi-arm clinical trials with multivariate longitudinal data. Journal of Multivariate Analysis 105447. https://doi.org/10.1016/j.jmva.2025.105447. MR4901559
Ghosh, D., Xu, X., Luo, S. and Database, C. I. P. (2025). Power and sample size calculation for multivariate longitudinal trials using the longitudinal rank sum test. Statistics in Medicine 44(20–22) 70261. https://doi.org/10.1002/sim.70261. MR4960437
Gibbons, J. D. and Chakraborti, S. (2010) Nonparametric Statistical Inference. CRC Press. MR2681063
Gray, R. J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. Annals of Statistics 16(3) 1141–1154. https://doi.org/10.1214/aos/1176350951. MR0959192
Halder, J. B., Benton, J., Julé, A. M., Guérin, P. J., Olliaro, P. L., Basáñez, M. -G. and Walker, M. (2017). Systematic review of studies generating individual participant data on the efficacy of drugs for treating soil-transmitted helminthiases and the case for data-sharing. PLoS Neglected Tropical Diseases 11(10) 0006053.
Henderson, C. R. (1954). Estimation of variance and covariance components. Biometrics 9(2) 226–252. https://doi.org/10.2307/3001853. MR0055650
Hess, K. R. (1994). Assessing time-by-covariate interactions in proportional hazards regression models using cubic spline functions. Statistics in medicine 13(10) 1045–1062. https://doi.org/10.1007/978-0-387-68639-4. MR2400249
Ho, M. -W., Tu, W., Ghosh, P. and Tiwari, R. C. (2013). A nested Dirichlet process analysis of cluster randomized trial data with application in geriatric care assessment. Journal of the American Statistical Association 108(501) 48–68. https://doi.org/10.1080/01621459.2012.734164. MR3174602
Hollander, M., Wolfe, D. A. and Chicken, E. (2013) Nonparametric Statistical Methods. Wiley. MR3221959
Hoo, J. -X., Yang, Y. -F., Tan, J. -Y., Yang, J., Yang, A. and Lim, L. -L. (2023). Impact of multicomponent integrated care on mortality and hospitalization after acute coronary syndrome: a systematic review and meta-analysis. European Heart Journal-Quality of Care and Clinical Outcomes 9(3) 258–267.
Ibrahim, J. G. and Molenberghs, G. (2009). Missing data methods in longitudinal studies: a review. Test 18(1) 1–43. https://doi.org/10.1007/s11749-009-0138-x. MR2495958
Kumar, D. (2018) Stress-Strength Estimation and its applications in Clinical Trials. State University of New York at Albany. MR3908068
Lehmann, E. L. and D’Abrera, H. J. M. (2006) Nonparametrics: Statistical Methods Based on Ranks. Springer. MR2279708
Li, L., Shen, C., Li, X. and Robins, J. M. (2013). On weighting approaches for missing data. Statistical methods in medical research 22(1) 14–30. https://doi.org/10.1177/0962280211403597. MR3190643
Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73(1) 13–22. https://doi.org/10.1093/biomet/73.1.13. MR0836430
Little, R. J. and Rubin, D. B. (2019) Statistical analysis with missing data. John Wiley & Sons. https://doi.org/10.1002/9781119013563. MR1925014
Mann, H. B. and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18(1) 50–60. https://doi.org/10.1214/aoms/1177730491. MR0022058
Rasmussen, C. E. and Williams, C. K. I. (2006) Gaussian Processes for Machine Learning. MIT Press. MR2514435
Ricciardi, F., Liverani, S. and Baio, G. (2023). Dirichlet process mixture models for regression discontinuity designs. Statistical methods in medical research 32(1) 55–70. https://doi.org/10.1177/09622802221129044. MR4528435
Schafer, J. L. (1997) Analysis of incomplete multivariate data. CRC press. https://doi.org/10.1201/9781439821862. MR1692799
Senn, S. (2006). Change from baseline and analysis of covariance revisited. Statistics in Medicine. https://doi.org/10.1002/sim.2682. MR2307596
Shapiro, R. E., Hochstetler, H. M., Dennehy, E. B., Khanna, R., Doty, E. G., Berg, P. H. and Starling, A. J. (2019). Lasmiditan for acute treatment of migraine in patients with cardiovascular risk factors: post-hoc analysis of pooled results from 2 randomized, double-blind, placebo-controlled, phase 3 trials. The journal of headache and pain 20 1–10.
Sirima, S. B., Ouédraogo, A., Tiono, A. B., Kaboré, J. M., Bougouma, E. C., Ouattara, M. S., Kargougou, D., Diarra, A., Henry, N., Ouédraogo, I. N. et al. (2022). A randomized controlled trial showing safety and efficacy of a whole sporozoite vaccine against endemic malaria. Science translational medicine 14(674) 3776.
Subbaswamy, A. and Saria, S. (2020). From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 21(2) 345–352. https://doi.org/10.1093/biostatistics/kxz041. MR4132548
Therneau, T. M. and Grambsch, P. M. (2000) Modeling Survival Data: Extending the Cox Model. Springer. https://doi.org/10.1007/978-1-4757-3294-8. MR1774977
Trella, A. L., Zhang, K. W., Jajal, H., Nahum-Shani, I., Shetty, V., Doshi-Velez, F. and Murphy, S. A. (2024). A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial. arXiv preprint arXiv:2409.02069.
Tsiatis, A. A. and Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica 14(3) 809–834. MR2087974
Tushar, F. I., Vancoillie, L., McCabe, C., Kavuri, A., Dahal, L., Harrawood, B., Fryling, M., Zarei, M., Sotoudeh-Paima, S., Ho, F. C. et al. (2025). Virtual lung screening trial (VLST): An in silico study inspired by the national lung screening trial for lung cancer detection. Medical Image Analysis 103 103576.
Verbeke, G. and Molenberghs, G. (2000) Linear Mixed Models for Longitudinal Data. Springer. https://doi.org/10.1007/978-1-4419-0300-6. MR1880596
Vinkers, D. J., Gussekloo, J., Stek, M. L., Westendorp, R. G. and Van Der Mast, R. C. (2004). The 15-item Geriatric Depression Scale (GDS-15) detects changes in depressive symptoms after a major negative life event. The Leiden 85-plus Study. International journal of geriatric psychiatry 19(1) 80–84.
Welch, B. L. (1947). The generalization of ‘STUDENT’S’problem when several different population varlances are involved. Biometrika 34(1-2) 28–35. https://doi.org/10.2307/2332510. MR0019277
Whitehead, J., Thygesen, H. and Whitehead, A. (2011). Bayesian procedures for phase I/II clinical trials investigating the safety and efficacy of drug combinations. Statistics in Medicine 30(16) 1952–1970. https://doi.org/10.1002/sim.4267. MR2829058
Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin 1(6) 80–83. Accessed 2024-12-01. https://doi.org/10.2307/3001946. MR0025133
Wood, S. N. (2017) Generalized additive models: an introduction with R. Chapman and hall/CRC. MR2206355
Zeger, S. L. and Liang, K. Y. (1988). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 44(4) 1049–1060. https://doi.org/10.2307/2532076. MR0999450
Zhao, Y., Kosorok, M. R. and Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine 28(26) 3294–3315. https://doi.org/10.1002/sim.3720. MR2750277
Zhao, Y., Zeng, D., Socinski, M. A. and Kosorok, M. R. (2011). Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 67(4) 1422–1433. https://doi.org/10.1111/j.1541-0420.2011.01572.x. MR2872393
Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology 57(1) 173–181. https://doi.org/10.1348/000711004849222. MR2087822