Highest Posterior Model Computation and Variable Selection via Simulated Annealing
Volume 1, Issue 2 (2023), pp. 200–207
Pub. online: 26 June 2023
Type: Statistical Methodology
Open Access
Accepted
30 May 2023
30 May 2023
Published
26 June 2023
26 June 2023
Abstract
Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. A formal way to perform this selection under the Bayesian approach is to select the model with highest posterior probability. The problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model. We propose to carry out this optimization using simulated annealing and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient. Theoretical justifications are provided and applications to high dimensional datasets are discussed. The proposed method is implemented in an R package sahpm for general use and is made available on R CRAN.
Supplementary material
Supplementary MaterialThe R package sahpm for the method SA-HPM is available on R CRAN . Further mathematical discussion on the convergence of this method is given in a separate supplementary material.
References
Barbieri, M. M., Berger, J. O., George, E. I. and Roková, V. (2021). The median probability model and correlated variables. Bayesian Analysis 16(4) 1085–1112. https://doi.org/10.1214/20-BA1249. MR4381128
Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. The annals of statistics 32(3) 870–897. https://doi.org/10.1214/009053604000000238. MR2065192
Basu, S. and Chib, S. (2003). Marginal likelihood and Bayes factors for Dirichlet process mixture models. Journal of the American Statistical Association 98(461) 224–235. https://doi.org/10.1198/01621450338861947. MR1965688
Bayarri, M. J., Berger, J. O., Forte, A. and García-Donato, G. (2012). Criteria for Bayesian model choice with application to variable selection. The Annals of statistics 40(3) 1550–1577. https://doi.org/10.1214/12-AOS1013. MR3015035
Berger, J. O. and Molina, G. (2005). Posterior model probabilities via path-based pairwise priors. Statistica Neerlandica 59(1) 3–15. https://doi.org/10.1111/j.1467-9574.2005.00275.x. MR2137378
Berger, J. O., Pericchi, L. R., Ghosh, J., Samanta, T., De Santis, F., Berger, J. and Pericchi, L. (2001). Objective Bayesian methods for model selection: Introduction and comparison. Lecture Notes-Monograph Series 135–207. https://doi.org/10.1214/lnms/1215540968. MR2000753
Bertsimas, D. and Tsitsiklis, J. (1993). Simulated annealing. Statistical science 8(1) 10–15. MR1194437
Bottolo, L. and Richardson, S. (2010). Evolutionary stochastic search for Bayesian model exploration. Bayesian Analysis 5(3) 583–618. https://doi.org/10.1214/10-BA523. MR2719668
Brusco, M. J. and Köhn, H. q. F. (2009). Exemplar-based clustering via simulated annealing. Psychometrika 74(3) 457–475. https://doi.org/10.1007/s11336-009-9115-2. MR2551671
Cadima, J., Cerdeira, J. O. and Minhoto, M. (2004). Computational aspects of algorithms for variable selection in the context of principal components. Computational Statistics & Data Analysis 47(2) 225–236. https://doi.org/10.1016/j.csda.2003.11.001. MR2101498
Casella, G. and Moreno, E. (2006). Objective Bayesian variable selection. Journal of the American Statistical Association 101(473) 157–167. https://doi.org/10.1198/016214505000000646. MR2268035
Casella, G., Girón, F. J., Martínez, M. L. and Moreno, E. (2009). Consistency of Bayesian procedures for variable selection. Annals of Statistics 37(3) 1207–1228. https://doi.org/10.1214/08-AOS606. MR2509072
Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3) 759–771. https://doi.org/10.1093/biomet/asn034. MR2443189
Clyde, M. A., Ghosh, J. and Littman, M. L. (2011). Bayesian adaptive sampling for variable selection and model averaging. Journal of Computational and Graphical Statistics 20(1) 80–101. https://doi.org/10.1198/jcgs.2010.09049. MR2816539
Cruz, J. R. and Dorea, C. C. Y. (1998). Simple conditions for the convergence of simulated annealing type algorithms. Journal of Applied Probability 35(4) 885–892. https://doi.org/10.1239/jap/1032438383. MR1671238
Dey, T., Ishwaran, H. and Rao, J. S. (2008). An in-depth look at highest posterior model selection. Econometric Theory 24(2) 377–403. https://doi.org/10.1017/S026646660808016X. MR2391616
Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(5) 849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x. MR2530322
Fernandez, C., Ley, E. and Steel, M. F. (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics 100(2) 381–427. https://doi.org/10.1016/S0304-4076(00)00076-2. MR1820410
Garcia-Donato, G. and Martinez-Beneito, M. A. (2013). On sampling strategies in Bayesian variable selection problems with large model spaces. Journal of the American Statistical Association 108(501) 340–352. https://doi.org/10.1080/01621459.2012.742443. MR3174624
Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection. Journal of the American Statistical Association 74(365) 153–160. MR0529531
Hahn, P. R. and Carvalho, C. M. (2015). Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective. Journal of the American Statistical Association 110(509) 435–448. https://doi.org/10.1080/01621459.2014.993077. MR3338514
Hans, C., Dobra, A. and West, M. (2007). Shotgun stochastic search for “large p” regression. Journal of the American Statistical Association 102(478) 507–516. https://doi.org/10.1198/016214507000000121. MR2370849
Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: a tutorial. Statistical Science 14(4)382–401. https://doi.org/10.1214/ss/1009212519. MR1765176
Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association 107(498) 649–660. https://doi.org/10.1080/01621459.2012.682536. MR2980074
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association 90(430) 773–795. https://doi.org/10.1080/01621459.1995.10476572. MR3363402
Kirkpatrick, S., Gelatt, C. D. and Vecchi, M. P. (1983). Optimization by simulated annealing. Science 220(4598) 671–680. https://doi.org/10.1126/science.220.4598.671. MR0702485
Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association 103(481) 410–423. https://doi.org/10.1198/016214507000001337. MR2420243
Maity, A. K., Basu, S. and Ghosh, S. (2021). Bayesian criterion-based variable selection. Journal of the Royal Statistical Society: Series C (Applied Statistics) 70(4) 835–857. https://doi.org/10.1111/rssc.12488. MR4318011
Moreno, E., Girón, F. J. and Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Annals of Statistics 38(4) 1937–1952. https://doi.org/10.1214/09-AOS754. MR2676879
Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics 38(5) 2587–2619. https://doi.org/10.1214/10-AOS792. MR2722450
Shi, M. and Dunson, D. B. (2011). Bayesian variable selection via particle stochastic search. Statistics & probability letters 81(2) 283–291. https://doi.org/10.1016/j.spl.2010.10.011. MR2764295
Shin, M. and Tian, R. (2017). BayesS5: Bayesian Variable Selection Using Simplified Shotgun Stochastic Search with Screening (S5). R package version 1.30. https://CRAN.R-project.org/package=BayesS5.
Shin, M., Bhattacharya, A. and Johnson, V. E. (2018). Scalable Bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statistica Sinica 28(2) 1053. MR3791100
Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(4) 583–639. https://doi.org/10.1111/1467-9868.00353. MR1979380
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1) 267–288. MR1379242
Wang, M. and Sun, X. (2013). Bayes factor consistency for unbalanced ANOVA models. Statistics 47(5) 1104–1115. https://doi.org/10.1080/02331888.2012.694445. MR3175737
Wang, M. and Sun, X. (2014). Bayes factor consistency for nested linear models with a growing number of parameters. Journal of Statistical Planning and Inference 147 95–105. https://doi.org/10.1016/j.jspi.2013.11.001. MR3151848
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research 11(Dec) 3571–3594. MR2756194