The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. To appear
  3. Bayesian Simultaneous Partial Envelope M ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Bayesian Simultaneous Partial Envelope Model with Application to an Imaging Genetics Analysis
Yanbo Shen   Yeonhee Park   Saptarshi Chakraborty     All authors (4)

Authors

 
Placeholder
https://doi.org/10.51387/23-NEJSDS23
Pub. online: 2 February 2023      Type: Statistical Methodology      Open accessOpen Access

Accepted
4 January 2023
Published
2 February 2023

Abstract

As a prominent dimension reduction method for multivariate linear regression, the envelope model has received increased attention over the past decade due to its modeling flexibility and success in enhancing estimation and prediction efficiencies. Several enveloping approaches have been proposed in the literature; among these, the partial response envelope model [57] that focuses on only enveloping the coefficients for predictors of interest, and the simultaneous envelope model [14] that combines the predictor and the response envelope models within a unified modeling framework, are noteworthy. In this article we incorporate these two approaches within a Bayesian framework, and propose a novel Bayesian simultaneous partial envelope model that generalizes and addresses some limitations of the two approaches. Our method offers the flexibility of incorporating prior information if available, and aids coherent quantification of all modeling uncertainty through the posterior distribution of model parameters. A block Metropolis-within-Gibbs algorithm for Markov chain Monte Carlo (MCMC) sampling from the posterior is developed. The utility of our model is corroborated by theoretical results, comprehensive simulations, and a real imaging genetics data application for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study.

References

[1] 
Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statistica Sinica 1281–1311.
[2] 
Boada, M., Antunez, C., Ramírez-Lorca, R., DeStefano, A. L., Gonzalez-Perez, A., Gayán, J., López-Arrieta, J., Ikram, M. A., Hernández, I., Marin, J. et al.(2014). ATP5H/KCTD2 locus is associated with Alzheimer’s disease risk. Molecular Psychiatry 19(6) 682–687.
[3] 
Broce, I. J., Tan, C. H., Fan, C. C., Jansen, I., Savage, J. E., Witoelar, A., Wen, N., Hess, C. P., Dillon, W. P., Glastonbury, C. M. et al.(2019). Dissecting the genetic relationship between cardiovascular risk factors and Alzheimer’s disease. Acta Neuropathologica 137(2) 209–226.
[4] 
Bura, E. and Cook, R. D. (2003). Rank estimation in reduced-rank regression. Journal of Multivariate Analysis 87(1) 159–176.
[5] 
Chakraborty, S. and Su, Z. A comprehensive Bayesian framework for envelope models. Technical Report.
[6] 
Chamberlain, G. (1982). Multivariate regression models for panel data. Journal of Econometrics 18(1) 5–46.
[7] 
Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M. and Lee, J. J. (2015). Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience 4(1) 13742–015.
[8] 
Chen, T., Su, Z., Yang, Y., Ding, S. et al.(2020). Efficient estimation in expectile regression using envelope models. Electronic Journal of Statistics 14(1) 143–173.
[9] 
Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika 85(2) 347–361.
[10] 
Conway, J. (1990) A course in functional analysis. New York: Springer.
[11] 
Cook, R. D. and Zhang, X. (2015). Foundations for envelope models and methods. Journal of the American Statistical Association 110(510) 599–611.
[12] 
Cook, R. D., Helland, I. S. and Su, Z. (2013). Envelopes and partial least squares regression. Journal of the Royal Statistical Society: Series B 75(5) 851–877.
[13] 
Cook, R. D. (2018). An introduction to envelopes: Dimension reduction for efficient estimation in multivariate statistics.
[14] 
Cook, R. D. and Zhang, X. (2015). Simultaneous envelopes for multivariate linear regression. Technometrics 57(1) 11–25.
[15] 
Cook, R. D., Li, B. and Chiaromonte, F. (2010). Envelope models for parsimonious and efficient multivariate linear regression (with discussion). Statistica Sinica 20 927–1010.
[16] 
De Jong, S. (1993). SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 18(3) 251–263.
[17] 
Dibble, C. C., Elis, W., Menon, S., Qin, W., Klekota, J., Asara, J. M., Finan, P. M., Kwiatkowski, D. J., Murphy, L. O. and Manning, B. D. (2012). TBC1D7 is a third subunit of the TSC1-TSC2 complex upstream of mTORC1. Molecular Cell 47(4) 535–546.
[18] 
Ding, S. and Dennis Cook, R. (2018). Matrix variate regressions and envelope models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(2) 387–408.
[19] 
Ding, S., Su, Z., Zhu, G. and Wang, L. (2020). Envelope quantile regression. Statistica Sinica 31(1) 79–105.
[20] 
Dubois, B., Hampel, H., Feldman, H. H., Scheltens, P., Aisen, P., Andrieu, S., Bakardjian, H., Benali, H., Bertram, L., Blennow, K. et al.(2016). Preclinical Alzheimer’s disease: Definition, natural history, and diagnostic criteria. Alzheimer’s & Dementia 12(3) 292–323.
[21] 
Fischl, B. (2012). FreeSurfer. Neuroimage 62(2) 774–781.
[22] 
Geyer, C. J. (1998). Markov chain Monte Carlo lecture notes. Course notes, Spring Quarter 80.
[23] 
Greenlaw, K., Szefer, E., Graham, J., Lesperance, M., Nathoo, F. S. and Alzheimer’s Disease Neuroimaging Initiative (2017). A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics 33(16) 2513–2522.
[24] 
Harold, D., Abraham, R., Hollingworth, P., Sims, R., Gerrish, A., Hamshere, M. L., Pahwa, J. S., Moskvina, V., Dowzell, K., Williams, A. et al.(2009). Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nature Genetics 41(10) 1088–1093.
[25] 
Hibar, D. P., Stein, J. L., Kohannim, O., Jahanshad, N., Saykin, A. J., Shen, L., Kim, S., Pankratz, N., Foroud, T., Huentelman, M. J. et al.(2011). Voxelwise gene-wide association study (vGeneWAS): Multivariate gene-based association testing in 731 elderly subjects. NeuroImage 56(4) 1875–1891.
[26] 
Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28(3-4) 321–377.
[27] 
Howie, B. N., Donnelly, P. and Marchini, J. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLOS Genetics 5(6) 1000529.
[28] 
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. and Abecasis, G. R. (2012). Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics 44(8) 955–959.
[29] 
Izenman, A. J. (1975). Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis 5(2) 248–264.
[30] 
Jansen, I. E., Savage, J. E., Watanabe, K., Bryois, J., Williams, D. M., Steinberg, S., Sealock, J., Karlsson, I. K., Hägg, S., Athanasiu, L. et al.(2019). Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nature Genetics 51(3) 404–413.
[31] 
Karch, C. M., Ezerskiy, L. A., Bertelsen, S., (ADGC), A. D. G. C. and Goate, A. M. (2016). Alzheimer’s disease risk polymorphisms regulate gene expression in the ZCWPW1 and the CELF1 loci. PLOS One 11(2) 0148717.
[32] 
Kendall, M. G. (1957). A course in multivariate analysis. Technical Report.
[33] 
Khare, K., Pal, S. and Su, Z. (2017). A Bayesian approach for envelope models. The Annals of Statistics 45(1) 196–222.
[34] 
Kim, S., Sohn, K.-A. and Xing, E. P. (2009). A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 25(12) 204–212.
[35] 
Kumar, N., Bansal, A., Sarma, G. and Rawal, R. K. (2014). Chemometrics tools used in analytical chemistry: An overview. Talanta 123 186–199.
[36] 
Lambert, J.-C., Ibrahim-Verbaas, C. A., Harold, D., Naj, A. C., Sims, R., Bellenguez, C., Jun, G., DeStefano, A. L., Bis, J. C., Beecham, G. W. et al.(2013). Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nature Genetics 45(12) 1452–1458.
[37] 
Lee, M. and Su, Z. (2020). A review of envelope models. International Statistical Review 88(3) 658–676.
[38] 
Lee, M., Chakraborty, S. and Su, Z. (2022). A Bayesian approach to envelope quantile regression. Statistica Sinica 32 1–19.
[39] 
Li, L. and Zhang, X. (2017). Parsimonious tensor response regression. Journal of the American Statistical Association 112(519) 1131–1146.
[40] 
Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association 94(448) 1264–1274.
[41] 
Naj, A. C., Leonenko, G., Jian, X., Grenier-Boley, B., Dalmasso, M. C., Bellenguez, C., Sha, J., Zhao, Y., van der Lee, S. J., Sims, R. et al. (2021). Genome-wide meta-analysis of late-onset Alzheimer’s disease using rare variant imputation in 65,602 subjects identifies novel rare variant locus NCK2: The International Genomics of Alzheimer’s Project (IGAP). medRxiv.
[42] 
Nathoo, F. S., Kong, L., Zhu, H. and Alzheimer’s Disease Neuroimaging Initiative (2019). A review of statistical methods in imaging genetics. Canadian Journal of Statistics 47(1) 108–131.
[43] 
Nestor, S. M., Rupsingh, R., Borrie, M., Smith, M., Accomazzi, V., Wells, J. L., Fogarty, J., Bartha, R. and Alzheimer’s Disease Neuroimaging Initiative (2008). Ventricular enlargement as a possible measure of Alzheimer’s disease progression validated using the Alzheimer’s disease neuroimaging initiative database. Brain 131(9) 2443–2454.
[44] 
Noorossana, R., Eyvazian, M., Amiri, A. and Mahmoud, M. A. (2010). Statistical monitoring of multivariate multiple linear regression profiles in phase I with calibration application. Quality and Reliability Engineering International 26(3) 291–303.
[45] 
Park, Y., Su, Z. and Chung, D. (2022). Envelope-based partial partial least squares with application to cytokine-based biomarker analysis for COVID-19. Statistics in Medicine 1–15.
[46] 
Park, Y., Su, Z. and Zhu, H. (2017). Groupwise envelope models for imaging genetic analysis. Biometrics 73(4) 1243–1253.
[47] 
Poulin, S. P., Dautoff, R., Morris, J. C., Barrett, L. F., Dickerson, B. C. and Alzheimer’s Disease Neuroimaging Initiative (2011). Amygdala atrophy is prominent in early Alzheimer’s disease and relates to symptom severity. Psychiatry Research: Neuroimaging 194(1) 7–13.
[48] 
Roberts, G. O. and Rosenthal, J. S. (2006). Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov chains. The Annals of Applied Probability 16(4) 2123–2139.
[49] 
Rosenthal, S. L., Barmada, M. M., Wang, X., Demirci, F. Y. and Kamboh, M. I. (2014). Connecting the dots: Potential of data integration to identify regulatory SNPs in late-onset Alzheimer’s disease GWAS findings. PLOS One 9(4) 95152.
[50] 
Rowe, B. Bayesian variable selection methods for genome wide association studies with categorical phenotypes (2020). PhD thesis, University of Nevada, Las Vegas.
[51] 
Selige, T., Böhner, J. and Schmidhalter, U. (2006). High resolution topsoil mapping using hyperspectral image and field data in multivariate regression modeling procedures. Geoderma 136(1-2) 235–244.
[52] 
Song, Y., Ge, S., Cao, J., Wang, L. and Nathoo, F. S. (2022). A Bayesian spatial model for imaging genetics. Biometrics 78(2) 742–753.
[53] 
Squillario, M., Tomasi, F., Tozzo, V., Barla, A., Uberti, D. and Alzheimer’s Disease Neuroimaging Initiative (2018). A 3-fold kernel approach for characterizing late-onset Alzheimer’s disease. bioRxiv 397760.
[54] 
Stein, J. L., Hua, X., Lee, S., Ho, A. J., Leow, A. D., Toga, A. W., Saykin, A. J., Shen, L., Foroud, T., Pankratz, N. et al.(2010). Voxelwise genome-wide association study (vGWAS). NeuroImage 53(3) 1160–1174.
[55] 
Strickland, S. L., Reddy, J. S., Allen, M., N’songo, A., Burgess, J. D., Corda, M. M., Ballard, T., Wang, X., Carrasquillo, M. M., Biernacka, J. M. et al.(2020). MAPT haplotype–stratified GWAS reveals differential association for AD risk variants. Alzheimer’s & Dementia 16(7) 983–1002.
[56] 
Su, Z., Zhu, G., Chen, X. and Yang, Y. (2016). Sparse envelope model: Efficient estimation and response variable selection in multivariate linear regression. Biometrika 103(3) 579–593.
[57] 
Su, Z. and Cook, R. D. (2011). Partial envelopes for efficient estimation in multivariate linear regression. Biometrika 98 133–146.
[58] 
Su, Z. and Cook, R. D. (2013). Estimation of multivariate means with heteroscedastic errors using envelope models. Statistica Sinica 23(1) 213–230.
[59] 
Talhouk, A., Doucet, A. and Murphy, K. (2012). Efficient Bayesian inference for multivariate probit models with sparse inverse correlation matrices. Journal of Computational and Graphical Statistics 21(3) 739–757.
[60] 
Tan, M.-S., Yang, Y.-X., Xu, W., Wang, H.-F., Tan, L., Zuo, C.-T., Dong, Q., Tan, L., Suckling, J. and Yu, J.-T. (2021). Associations of Alzheimer’s disease risk variants with gene expression, amyloidosis, tauopathy, and neurodegeneration. Alzheimer’s Research & Therapy 13(1) 1–11.
[61] 
Torvell, M., Carpanini, S. M., Daskoulidou, N., Byrne, R. A., Sims, R. and Morgan, B. P. (2021). Genetic Insights into the Impact of Complement in Alzheimer’s Disease. Genes 12(12) 1990.
[62] 
Veera Manikandan, R. and Anand, R. (2015). P2-012: A genome wide scan for genetic variations with inverse association between Alzheimer’s disease and breast cancer. Alzheimer’s & Dementia 11(7S_Part_10) 485–485.
[63] 
Visscher, P. M., Wray, N. R., Zhang, Q., Sklar, P., McCarthy, M. I., Brown, M. A. and Yang, J. (2017). 10 years of GWAS discovery: Biology, function, and translation. The American Journal of Human Genetics 101(1) 5–22.
[64] 
Vounou, M., Nichols, T. E., Montana, G. and Alzheimer’s Disease Neuroimaging Initiative (2010). Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach. NeuroImage 53(3) 1147–1159.
[65] 
Wang, H., Nie, F., Huang, H., Risacher, S. L., Saykin, A. J., Shen, L. and Alzheimer’s Disease Neuroimaging Initiative (2012). Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 28(12) 127–136.
[66] 
White, C. C., Yang, H.-S., Schneider, J. A., Bennett, D. A., De Jager, P. L. and group, C. A. F. (2021). A genome-wide investigation of clinicopathologic endophenotypes uncovers a new susceptibility locus for tau pathology at Neurotrimin (NTM). Alzheimer’s & Dementia 17 051682.
[67] 
Zhang, C. and Yu, T. (2008). Semiparametric detection of significant activation for brain fMRI. The Annals of Statistics 36(4) 1693–1725.
[68] 
Zhu, H., Khondker, Z., Lu, Z. and Ibrahim, J. G. (2014). Bayesian generalized low rank regression models for neuroimaging phenotypes and genetic markers. Journal of the American Statistical Association 109(507) 977–990.

Full article PDF XML
Full article PDF XML

Copyright
© 2023 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Bayesian envelope model Multivariate regression Reducing subspace Simultaneous envelope Partial envelope Imaging genetics

Funding
C. Zhang’s work was partially supported by U.S. National Science Foundation grants DMS-2013486 and DMS-1712418, and provided by the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. Park’s research is partially supported by University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Metrics (since February 2017)
180

Article info
views

51

Full article
views

45

PDF
downloads

6

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy