The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 2, Issue 3 (2024)
  4. Sparse Estimation in Finite Mixture of A ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Sparse Estimation in Finite Mixture of Accelerated Failure Time and Mixture of Regression Models with R Package fmrs
Volume 2, Issue 3 (2024), pp. 339–356
Farhad Shokoohi ORCID icon link to view author Farhad Shokoohi details  

Authors

 
Placeholder
https://doi.org/10.51387/23-NEJSDS49
Pub. online: 23 October 2023      Type: Software Tutorial And/or Review      Open accessOpen Access
Area: Software

Accepted
4 September 2023
Published
23 October 2023

Abstract

Variable selection in large-dimensional data has been extensively studied in different settings over the past decades. In a recent article, Shokoohi et. al. [29, DOI:10.1214/18-AOAS1198] proposed a method for variable selection in finite mixture of accelerated failure time regression models for studies on time-to-event data to capture heterogeneity within the population and account for censoring. In this paper, we introduce the fmrs package, which implements the variable selection methodology for such models. Furthermore, as a byproduct, the fmrs package facilitates variable selection in finite mixture regression models. The package also incorporates a tuning parameter selection mechanism based on component-wise bic. Commonly used penalties, such as Least Absolute Shrinkage and Selection Operator, and Smoothly Clipped Absolute Deviation, are integrated into fmrs. Additionally, the package offers an option for non-mixture regression models. The C language is chosen to boost the optimization speed. We provide an overview of the fmrs principles and the strategies employed for optimization. Hands-on illustrations are presented to help users get acquainted with fmrs. Finally, we apply fmrs to a lung cancer dataset and observe that a two-component mixture model reveals a subgroup with a more aggressive form of the disease, displaying a lower survival time.

Supplementary material

 Supplementary Material
Supplementary materials are available online with this paper at the New England Journal of Statistics in Data Science website which includes version 2.0.1 of the fmrs package, the R and Python codes as well as simulated datasets (’CodesAndData.zip’ file) for reproducibility and simulation studies.

References

[1] 
Basturk, N., Hoogerheide, L. F., Opschoor, A. van Dijk, H. K. (2015). MitISEM: Mixture of Student t Distributions using Importance Sampling and Expectation Maximization. https://cran.r-project.org/web/packages/MitISEM/.
[2] 
Benaglia, T., Chauveau, D., Hunter, D. R. Young, D. (2009). mixtools: An R Package for Analyzing Finite Mixture Models. Journal of Statistical Software 32(6) 1–29.
[3] 
Berk, R., Brown, L., Buja, A., Zhang, K. Zhao, L. (2013). Valid post-selection inference. The Annals of Statistics 41(2) 802–837. https://doi.org/10.1214/12-AOS1077.
[4] 
Biernacki, C., Celeux, G. Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis 41(3) 561–575. Recent Developments in Mixture Model. https://doi.org/10.1016/S0167-9473(02)00163-9. MR1968069
[5] 
Bilgrau, A. E., Eriksen, P. S., Rasmussen, J. G., Johnsen, H. E., Dybkaer, K. Boegsted, M. (2016). GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models. Journal of Statistical Software 70(2) 1–23. https://doi.org/10.18637/jss.v070.i02.
[6] 
Chambers, J. M. (1998) Programming with Data. Springer-Verlag, New York.
[7] 
Chambers, J. M. (2008) Software for Data Analysis: Programming with R. Springer, New York.
[8] 
Chiou, S. H., Kang, S. Yan, J. (2014). Fitting Accelerated Failure Time Models in Routine Survival Analysis with R Package aftgee. Journal of Statistical Software 61(11) 1–23.
[9] 
Fraley, C., Raftery, A. E. Scrucca, L. (2002). Inference for finite mixture models. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64(3) 491–507.
[10] 
Fraley, C., Raftery, A. E., Murphy, T. B. Scrucca, L. (2012). mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation. https://doi.org/10.1007/s11222-009-9138-7. MR2725403
[11] 
Garay, A. M., Massuia, M. B. Lachos, V. H. (2015). BayesCR: Bayesian Analysis of Censored Regression Models Under Scale Mixture of Skew Normal Distributions. https://cran.r-project.org/web/packages/BayesCR/.
[12] 
Garay, A. M., Massuia, M. B. Lachos, V. (2015). SMNCensReg: Fitting Univariate Censored Regression Model Under the Family of Scale Mixture of Normal Distributions. https://cran.r-project.org/web/packages/SMNCensReg/.
[13] 
Grün, B. Leisch, F. (2008). FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters. Journal of Statistical Software 28(4) 1–35.
[14] 
Hennig, C. (2004). Identifiability of mixtures of regression models. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 66(3) 593–615.
[15] 
Iannario, M. Piccolo, D. (2015). CUB: A Class of Mixture Models for Ordinal Data. https://cran.r-project.org/web/packages/CUB/.
[16] 
Iovleff, S. (2015). MixAll: Clustering Heterogenous data with Missing Values. https://cran.r-project.org/web/packages/MixAll/.
[17] 
Jin, Z. (2016). Semiparametric accelerated failure time model for the analysis of right censored data. Communications for Statistical Applications and Methods 23(6) 467–478.
[18] 
Kamary, K. Lee, K. (2015). Ultimixt: Bayesian Analysis of a Non-Informative Parametrization for Gaussian Mixture Distributions. https://cran.r-project.org/web/packages/Ultimixt/index.html.
[19] 
Lawless, J. F. (2003) Statistical Models and Methods for Lifetime Data, 2nd ed. Wiley Series in Probability and Statistics. MR1940115
[20] 
Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G. Govaert, G. (2015). Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library. Journal of Statistical Software 67(6) 1–29. https://doi.org/10.18637/jss.v067.i06.
[21] 
Lee, S. X. McLachlan, G. J. (2013). EMMIXuskew: An R Package for Fitting Mixtures of Multivariate Skew t Distributions via the EM Algorithm. Journal of Statistical Software 55(12) 1–22. https://doi.org/10.1007/s11222-012-9362-4. MR3165547
[22] 
Loprinzi, C. L., Laurie, J. A., Wieand, H. S. et al. (1994). Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology 12(3) 601–607. PMID: 8120560. https://doi.org/10.1200/JCO.1994.12.3.601.
[23] 
Mahani, A. S. Sharabiani, M. T. A. (2015). BayesMixSurv: Bayesian Mixture Survival Models using Additive Mixture-of-Weibull Hazards, with Lasso Shrinkage and Stratification. https://cran.r-project.org/web/packages/BayesMixSurv/.
[24] 
McLachlan, G. Peel, D. (2004) Finite Mixture Models. John Wiley & Sons. https://doi.org/10.1002/0471721182. MR1789474
[25] 
Panaro, R. V. (2020). spsurv: An R package for semi-parametric survival analysis. 2003.10548.
[26] 
Prates, M. O., Cabral, C. R. B. Lachos, V. H. (2013). mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions. Journal of Statistical Software 54(12) 1–20.
[27] 
Sanchez, L. B. Lachos, V. H. (2015). CensMixReg: Censored Linear Mixture Regression Models. https://cran.r-project.org/web/packages/CensMixReg/.
[28] 
Schlattmann, P., Hoehne, J. Verba, M. (2015). CAMAN: Finite Mixture Models and Meta-Analysis Tools - Based on C.A.MAN. https://cran.r-project.org/web/packages/CAMAN/.
[29] 
Shokoohi, F., Khalili, A., Asgharian, M. Lin, S. (2019). Capturing heterogeneity of covariate effects in hidden subpopulations in the presence of censoring and large number of covariates. The Annals of Applied Statistics 13(1) 444–465. https://doi.org/10.1214/18-AOAS1198. MR3937436
[30] 
Silva, R. R. (2016). BayesH: Bayesian Regression Model with Mixture of Two Scaled Inverse Chi Square as Hyperprior. https://cran.r-project.org/web/packages/BayesH/.
[31] 
Städler, N. (2010). fmrlasso: Lasso for Finite Mixture of Regressions. http://mukherjeelab.nki.nl/stadler/fmrlasso_1.0.tar.gz.
[32] 
Su, S., Wuertz, D., Maechler, M., Rmetrics et al. (2015). GLDEX: Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods. https://cran.r-project.org/web/packages/GLDEX/.
[33] 
Tortora, C., Browne, R. P., Franczak, B. C. McNicholas, P. D. (2015). MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions. https://cran.r-project.org/web/packages/MixGHD/.

Full article PDF XML
Full article PDF XML

Copyright
© 2024 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Feature selection Penalized likelihood The EM algorithm Survival model Lung Cancer

Funding
Farhad Shokoohi is supported by the University of Nevada - Las Vegas, through the Startup Grant PG18929.

Metrics
since December 2021
484

Article info
views

257

Full article
views

169

PDF
downloads

61

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy