A Comparison of Methods for Estimating the Average Treatment Effect on the Treated for Externally Controlled Trials
Pub. online: 13 March 2025
Type: Statistical Methodology
Open Access
1
Contributed equally.
Accepted
24 January 2025
24 January 2025
Published
13 March 2025
13 March 2025
Abstract
While randomized trials may be the gold standard for evaluating the effectiveness of the treatment intervention, in some special circumstances, single-arm clinical trials utilizing external control may be considered. The causal treatment effect of interest for single-arm trials is usually the average treatment effect on the treated (ATT) rather than the average treatment effect (ATE). Although methods have been developed to estimate the ATT, the selection and use of these methods require a thorough comparison and in-depth understanding of the advantages and disadvantages of these methods. In this study, we conducted simulations under different identifiability assumptions to compare the performance metrics (e.g., bias, standard deviation (SD), mean squared error (MSE), type I error rate) for a variety of methods, including the regression model, propensity score matching (PSM), Mahalanobis distance matching (MDM), coarsened exact matching, inverse probability weighting, augmented inverse probability weighting (AIPW), AIPW with SuperLearner, and targeted maximum likelihood estimator (TMLE) with SuperLearner.
Our simulation results demonstrate that the doubly robust methods in general have smaller biases than other methods. In terms of SD, nonmatching methods in general have smaller SDs than matching-based methods. The performance of MSE is a trade-off between the bias and SD, and no method consistently performs better in term of MSE. The identifiability assumptions are critical to the models’ performance: Violation of the positivity assumption can lead to a significant inflation of type I errors in some methods; violation of the unconfoundedness assumption can lead to a large bias for all methods.
According to the simulation results, under most scenarios we examined, PSM and MDM methods perform best overall in terms of type I error control. However, they in general have worse performance in the estimation accuracy compared to doubly robust methods given that the identifiability assumptions are not severely violated.
Supplementary material
Supplementary MaterialSupplementary materials are available online with this paper at the New England Journal of Statistics in Data Science website which includes Figures S1–S10.
References
Abadie, A. (2005). Semiparametric difference-in-differences estimators. The Review of Economic Studies 72(1) 1–19. https://doi.org/10.1111/0034-6527.00321. MR2116973
Abdia, Y., Kulasekera, K., Datta, S., Boakye, M. and Kong, M. (2017). Propensity scores based methods for estimating average treatment effect and average treatment effect among treated: a comparative study. Biometrical Journal 59(5) 967–985. https://doi.org/10.1002/bimj.201600094. MR3696495
Austin, P. C. (2009). Type I error rates, coverage of confidence intervals, and variance estimation in propensity-score matched analyses. The International Journal of Biostatistics 5(1). https://doi.org/10.2202/1557-4679.1146. MR2504960
Austin, P. C. (2014). A comparison of 12 algorithms for matching on the propensity score. Statistics in Medicine 33(6) 1057–1069. https://doi.org/10.1002/sim.6004. MR3249041
Austin, P. C. (2022). Bootstrap vs asymptotic variance estimation when using propensity score weighting with continuous and binary outcomes. Statistics in Medicine 41(22) 4426–4443. MR4483678
Chatton, A., Le Borgne, F., Leyrat, C., Gillaizeau, F., Rousseau, C., Barbin, L., Laplaud, D., Léger, M., Giraudeau, B. and Foucher, Y. (2020). G-computation, propensity score-based methods, and targeted maximum likelihood estimator for causal inference with different covariates sets: a comparative simulation study. Scientific Reports 10(1) 9219.
Heckman, J. J., Ichimura, H. and Todd, P. E. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. The Review of Economic Studies 64(4) 605–654. https://doi.org/10.1111/1467-937X.00044. MR1623713
Iacus, S. M., King, G. and Porro, G. (2011). Multivariate matching methods that are monotonic imbalance bounding. Journal of the American Statistical Association 106(493) 345–361. https://doi.org/10.1198/jasa.2011.tm09599. MR2816726
Léger, M., Chatton, A., Le Borgne, F., Pirracchio, R., Lasocki, S. and Foucher, Y. (2022). Causal inference in case of near-violation of positivity: comparison of methods. Biometrical Journal 64 1389–1403. https://doi.org/10.1002/bimj.202000323. MR4523219
Li, F., Morgan, K. L. and Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association 113(521) 390–400. https://doi.org/10.1080/01621459.2016.1260466. MR3803473
Mao, H., Li, L. and Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research 28(8) 2439–2454. https://doi.org/10.1177/0962280218781171. MR3988108
Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90(429) 106–121. MR1325118
Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70(1) 41–55. https://doi.org/10.1093/biomet/70.1.41. MR0742974
Schuler, M. S. and Rose, S. (2017). Targeted maximum likelihood estimation for causal inference in observational studies. American Journal of Epidemiology 185(1) 65–73. https://doi.org/10.2202/1557-4679.1241. MR2595112
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics 25(1) 1. https://doi.org/10.1214/09-STS313. MR2741812
Van Der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. The International Journal of Biostatistics 2(1). https://doi.org/10.2202/1557-4679.1043. MR2306500
Van der Laan, M. J., Rose, S. et al. (2011) Targeted learning: causal inference for observational and experimental data 4. Springer. https://doi.org/10.1007/978-1-4419-9782-1. MR2867111