Sparse Estimation in Finite Mixture of Accelerated Failure Time and Mixture of Regression Models with R Package fmrs
Pub. online: 23 October 2023 Type: Software Open Access
4 September 2023
4 September 2023
23 October 2023
23 October 2023
Variable selection in large-dimensional data has been extensively studied in different settings over the past decades. In a recent article, Shokoohi et. al. [29, DOI:10.1214/18-AOAS1198] proposed a method for variable selection in finite mixture of accelerated failure time regression models for studies on time-to-event data to capture heterogeneity within the population and account for censoring. In this paper, we introduce the fmrs package, which implements the variable selection methodology for such models. Furthermore, as a byproduct, the fmrs package facilitates variable selection in finite mixture regression models. The package also incorporates a tuning parameter selection mechanism based on component-wise bic. Commonly used penalties, such as Least Absolute Shrinkage and Selection Operator, and Smoothly Clipped Absolute Deviation, are integrated into fmrs. Additionally, the package offers an option for non-mixture regression models. The C language is chosen to boost the optimization speed. We provide an overview of the fmrs principles and the strategies employed for optimization. Hands-on illustrations are presented to help users get acquainted with fmrs. Finally, we apply fmrs to a lung cancer dataset and observe that a two-component mixture model reveals a subgroup with a more aggressive form of the disease, displaying a lower survival time.
Supplementary materialSupplementary Material
Supplementary materials are available online with this paper at the New England Journal of Statistics in Data Science website which includes version 2.0.1 of the
fmrs package, the R and Python codes as well as simulated datasets (’ CodesAndData.zip’ file) for reproducibility and simulation studies.
Basturk, N., Hoogerheide, L. F., Opschoor, A. van Dijk, H. K. (2015). MitISEM: Mixture of Student t Distributions using Importance Sampling and Expectation Maximization. https://cran.r-project.org/web/packages/MitISEM/.
Berk, R., Brown, L., Buja, A., Zhang, K. Zhao, L. (2013). Valid post-selection inference. The Annals of Statistics 41(2) 802–837. https://doi.org/10.1214/12-AOS1077.
Biernacki, C., Celeux, G. Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis 41(3) 561–575. Recent Developments in Mixture Model. https://doi.org/10.1016/S0167-9473(02)00163-9. MR1968069
Bilgrau, A. E., Eriksen, P. S., Rasmussen, J. G., Johnsen, H. E., Dybkaer, K. Boegsted, M. (2016). GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models. Journal of Statistical Software 70(2) 1–23. https://doi.org/10.18637/jss.v070.i02.
Garay, A. M., Massuia, M. B. Lachos, V. H. (2015). BayesCR: Bayesian Analysis of Censored Regression Models Under Scale Mixture of Skew Normal Distributions. https://cran.r-project.org/web/packages/BayesCR/.
Garay, A. M., Massuia, M. B. Lachos, V. (2015). SMNCensReg: Fitting Univariate Censored Regression Model Under the Family of Scale Mixture of Normal Distributions. https://cran.r-project.org/web/packages/SMNCensReg/.
Iannario, M. Piccolo, D. (2015). CUB: A Class of Mixture Models for Ordinal Data. https://cran.r-project.org/web/packages/CUB/.
Iovleff, S. (2015). MixAll: Clustering Heterogenous data with Missing Values. https://cran.r-project.org/web/packages/MixAll/.
Kamary, K. Lee, K. (2015). Ultimixt: Bayesian Analysis of a Non-Informative Parametrization for Gaussian Mixture Distributions. https://cran.r-project.org/web/packages/Ultimixt/index.html.
Lawless, J. F. (2003) Statistical Models and Methods for Lifetime Data, 2nd ed. Wiley Series in Probability and Statistics. MR1940115
Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G. Govaert, G. (2015). Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library. Journal of Statistical Software 67(6) 1–29. https://doi.org/10.18637/jss.v067.i06.
Loprinzi, C. L., Laurie, J. A., Wieand, H. S. et al. (1994). Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology 12(3) 601–607. PMID: 8120560. https://doi.org/10.1200/JCO.19220.127.116.111.
Mahani, A. S. Sharabiani, M. T. A. (2015). BayesMixSurv: Bayesian Mixture Survival Models using Additive Mixture-of-Weibull Hazards, with Lasso Shrinkage and Stratification. https://cran.r-project.org/web/packages/BayesMixSurv/.
Panaro, R. V. (2020). spsurv: An R package for semi-parametric survival analysis. 2003.10548.
Sanchez, L. B. Lachos, V. H. (2015). CensMixReg: Censored Linear Mixture Regression Models. https://cran.r-project.org/web/packages/CensMixReg/.
Schlattmann, P., Hoehne, J. Verba, M. (2015). CAMAN: Finite Mixture Models and Meta-Analysis Tools - Based on C.A.MAN. https://cran.r-project.org/web/packages/CAMAN/.
Silva, R. R. (2016). BayesH: Bayesian Regression Model with Mixture of Two Scaled Inverse Chi Square as Hyperprior. https://cran.r-project.org/web/packages/BayesH/.
Städler, N. (2010). fmrlasso: Lasso for Finite Mixture of Regressions. http://mukherjeelab.nki.nl/stadler/fmrlasso_1.0.tar.gz.
Su, S., Wuertz, D., Maechler, M., Rmetrics et al. (2015). GLDEX: Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods. https://cran.r-project.org/web/packages/GLDEX/.
Tortora, C., Browne, R. P., Franczak, B. C. McNicholas, P. D. (2015). MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions. https://cran.r-project.org/web/packages/MixGHD/.