The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 1, Issue 2 (2023)
  4. Bayesian Variable Selection in Double Ge ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • More
    Article info Full article Related articles

Bayesian Variable Selection in Double Generalized Linear Tweedie Spatial Process Models
Volume 1, Issue 2 (2023), pp. 187–199
Aritra Halder 1 ORCID icon link to view author Aritra Halder details   Shariq Mohammed 1 ORCID icon link to view author Shariq Mohammed details   Dipak K. Dey ORCID icon link to view author Dipak K. Dey details  

Authors

 
Placeholder
https://doi.org/10.51387/23-NEJSDS37
Pub. online: 19 June 2023      Type: Methodology Article      Open accessOpen Access
Area: Statistical Methodology

1 Equal contribution.

Accepted
31 May 2023
Published
19 June 2023

Abstract

Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.

Supplementary material

 Supplementary Material
Supplementary Material containing further details as described in Section 4 is available online. The R–package is available for installation and deployment at: https://github.com/arh926/sptwdglm.

References

[1] 
Abramowitz, M., Stegun, I. A. and Romer, R. H. (1988). Handbook of mathematical functions with formulas, graphs, and mathematical tables. American Association of Physics Teachers. MR0415962
[2] 
Agarwal, D. K., Gelfand, A. E. and Citron-Pousty, S. (2002). Zero-inflated models with application to spatial count data. Environmental and Ecological statistics 9(4) 341–355. https://doi.org/10.1023/A:1020910605990. MR1951713
[3] 
Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike 199–213 Springer. MR1486823
[4] 
Banerjee, S. and Carlin, B. P. (2004). Parametric spatial cure rate models for interval-censored time-to-relapse data. Biometrics 60(1) 268–275. https://doi.org/10.1111/j.0006-341X.2004.00032.x. MR2044123
[5] 
Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2014). Hierarchical Modeling and Analysis for Spatial Data. MR3362184
[6] 
Berger, J. O., De Oliveira, V. and Sansó, B. (2001). Objective Bayesian analysis of spatially correlated data. Journal of the American Statistical Association 96(456) 1361–1374. https://doi.org/10.1198/016214501753382282. MR1946582
[7] 
Berger, J. O., Pericchi, L. R., Ghosh, J., Samanta, T., De Santis, F., Berger, J. and Pericchi, L. (2001). Objective Bayesian methods for model selection: Introduction and comparison. Lecture Notes-Monograph Series 135–207. https://doi.org/10.1214/lnms/1215540968. MR2000753
[8] 
Berliner, M. (2000). Hierarchical Bayesian modeling in the environmental sciences. AStA Advances in Statistical Analysis 2(84) 141–153. https://doi.org/10.1214/06-BA130. MR2282211
[9] 
Best, N. G., Ickstadt, K. and Wolpert, R. L. (2000). Spatial Poisson regression for health and exposure data measured at disparate resolutions. Journal of the American statistical association 95(452) 1076–1088. https://doi.org/10.2307/2669744. MR1821716
[10] 
Bradley, J. R., Holan, S. H. and Wikle, C. K. (2018). Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data (with discussion). Bayesian Analysis 13(1) 253–310. https://doi.org/10.1214/17-BA1069. MR3773410
[11] 
Bradley, J. R., Holan, S. H. and Wikle, C. K. (2020). Bayesian hierarchical models with conjugate full-conditional distributions for dependent data from the natural exponential family. Journal of the American Statistical Association 115(532) 2037–2052. https://doi.org/10.1080/01621459.2019.1677471. MR4189775
[12] 
Carlin, B. P. and Louis, T. A. (2008) Bayesian methods for data analysis. CRC press. MR2442364
[13] 
Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97(2) 465–480. https://doi.org/10.1093/biomet/asq017. MR2650751
[14] 
Clark, J. and Gelfand, A. (2006). A future for models and data in ecology. Trends in Ecology and Evolution 21 375–380.
[15] 
Cressie, N. (2015) Statistics for spatial data. John Wiley & Sons. MR3559472
[16] 
Dey, D. K., Ghosh, S. K. and Mallick, B. K. (2000) Generalized linear models: A Bayesian perspective. CRC Press. MR1893779
[17] 
Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998). Model-based geostatistics. Journal of the Royal Statistical Society: Series C (Applied Statistics) 47(3) 299–350. https://doi.org/10.1111/1467-9876.00113. MR1626544
[18] 
Dunn, P. K. and Smyth, G. K. (2005). Series evaluation of Tweedie exponential dispersion model densities. Statistics and Computing 15(4) 267–280. https://doi.org/10.1007/s11222-005-4070-y. MR2205390
[19] 
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing 18(1) 73–86. https://doi.org/10.1007/s11222-007-9039-6. MR2416440
[20] 
Eidsvik, J., Finley, A. O., Banerjee, S. and Rue, H. (2012). Approximate Bayesian inference for large spatial datasets using predictive process models. Computational Statistics & Data Analysis 56(6) 1362–1380. https://doi.org/10.1016/j.csda.2011.10.022. MR2892347
[21] 
Finley, A. O., Banerjee, S. and McRoberts, R. E. (2009). Hierarchical spatial models for predicting tree species assemblages across large domains. The annals of applied statistics 3(3) 1052. https://doi.org/10.1214/09-AOAS250. MR2750386
[22] 
Gelfand, A. E., Sahu, S. K. and Carlin, B. P. (1996). Efficient Parametrizations for Generalized Linear Mixed Models. Bayesian Statistics 5: Proceedings of the Fifth Valencia International Meeting 165–180. MR1425405
[23] 
Gelfand, A. E. (2000). Modeling and Inference for Point-Referenced Binary Spatial Data. Generalized linear models: a Bayesian perspective 373. MR1893801
[24] 
Gelfand, A. E. and Banerjee, S. (2017). Bayesian modeling and analysis of geostatistical data. Annual review of statistics and its application 4 245–266.
[25] 
Gelfand, A. E., Sahu, S. K. and Carlin, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika 82(3) 479–488. https://doi.org/10.1093/biomet/82.3.479. MR1366275
[26] 
Gelfand, A. E., Schmidt, A. M., Wu, S., Silander Jr, J. A., Latimer, A. and Rebelo, A. G. (2005). Modelling species diversity through species level hierarchical modelling. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54(1) 1–20. https://doi.org/10.1111/j.1467-9876.2005.00466.x. MR2134594
[27] 
George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association 88(423) 881–889.
[28] 
Girolami, M. and Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(2) 123–214. https://doi.org/10.1111/j.1467-9868.2010.00765.x. MR2814492
[29] 
Halder, A., Mohammed, S., Chen, K. and Dey, D. K. (2021). Spatial Tweedie exponential dispersion models: an application to insurance rate-making. Scandinavian Actuarial Journal 2021(10) 1017–1036. https://doi.org/10.1080/03461238.2021.1921017. MR4345874
[30] 
Halder, A., Mohammed, S., Chen, K. and Dey, D. K. (2022). Spatial Risk Estimation in Tweedie Double Generalized Linear Models. Proceedings of International E-Conference on Mathematical and Statistical Sciences: A Selcuk Meeting 2022 62.
[31] 
Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M. et al. (2019). A Case Study Competition Among Methods For Analyzing Large Spatial Data. Journal of Agricultural, Biological and Environmental Statistics 24(3) 398–425.
[32] 
Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and EI George, and a rejoinder by the authors. Statistical science 14(4) 382–417. https://doi.org/10.1214/ss/1009212519. MR1765176
[33] 
Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: frequentist and Bayesian strategies. The Annals of Statistics 33(2) 730–773. https://doi.org/10.1214/009053604000001147. MR2163158
[34] 
Jørgensen, B. (1986). Some properties of exponential dispersion models. Scandinavian Journal of Statistics 187–197. MR0873073
[35] 
Jørgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society: Series B (Methodological) 49(2) 127–145. MR0905186
[36] 
Jørgensen, B. (1992). Exponential dispersion models and extensions: A review. International Statistical Review/Revue Internationale de Statistique 5–20.
[37] 
Jorgensen, B. (1997) The theory of dispersion models. CRC Press. MR1462891
[38] 
Kokonendji, C. C., Bonat, W. H. and Abid, R. (2021). Tweedie regression models and its geometric sums for (semi-) continuous data. Wiley Interdisciplinary Reviews: Computational Statistics 13(1) 1496. https://doi.org/10.1002/wics.1496. MR4186771
[39] 
Lawson, A. B. (2018) Bayesian disease mapping: hierarchical modeling in spatial epidemiology. Chapman and Hall/CRC. MR2484272
[40] 
Lee, Y. and Nelder, J. A. (2006). Double hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society: Series C (Applied Statistics) 55(2) 139–185. https://doi.org/10.1111/j.1467-9876.2006.00538.x. MR2226543
[41] 
Li, Q. and Lin, N. (2010). The Bayesian elastic net. Bayesian analysis 5(1) 151–170. https://doi.org/10.1214/10-BA506. MR2596439
[42] 
Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of g-priors for Bayesian variable selection. Journal of the American Statistical Association 103(481) 410–423. https://doi.org/10.1198/016214507000001337. MR2420243
[43] 
Mallick, H., Chatterjee, S., Chowdhury, S., Chatterjee, S., Rahnavard, A. and Hicks, S. C. (2022). Differential expression of single-cell RNA-seq data using Tweedie models. Statistics in medicine 41(18) 3492–3510. https://doi.org/10.1002/sim.9430. MR4453460
[44] 
Martino, S., Akerkar, R. and Rue, H. (2011). Approximate Bayesian inference for survival models. Scandinavian Journal of Statistics 38(3) 514–528. https://doi.org/10.1111/j.1467-9469.2010.00715.x. MR2833844
[45] 
Matérn, B. (2013) Spatial variation 36. Springer Science & Business Media. https://doi.org/10.1007/978-1-4615-7892-5. MR0867886
[46] 
Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. Journal of the american statistical association 83(404) 1023–1032. MR0997578
[47] 
Mohammed, S., Bharath, K., Kurtek, S., Rao, A. and Baladandayuthapani, V. (2021). RADIOHEAD: Radiogenomic analysis incorporating tumor heterogeneity in imaging through densities. The Annals of Applied Statistics 15(4) 1808–1830. https://doi.org/10.1214/21-aoas1458. MR4355077
[48] 
Morris, J. S., Brown, P. J., Herrick, R. C., Baggerly, K. A. and Coombes, K. R. (2008). Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models. Biometrics 64(2) 479–489. https://doi.org/10.1111/j.1541-0420.2007.00895.x. MR2432418
[49] 
Nelder, J. A. and Pregibon, D. (1987). An extended quasi-likelihood function. Biometrika 74(2) 221–232. https://doi.org/10.1093/biomet/74.2.221. MR0903123
[50] 
Park, T. and Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association 103(482) 681–686. https://doi.org/10.1198/016214508000000337. MR2524001
[51] 
Raftery, A. E., Madigan, D. and Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association 92(437) 179–191. https://doi.org/10.2307/2291462. MR1436107
[52] 
Roberts, G. O. and Stramer, O. (2002). Langevin diffusions and Metropolis-Hastings algorithms. Methodology and computing in applied probability 4(4) 337–357. https://doi.org/10.1023/A:1023562417138. MR2002247
[53] 
Shono, H. (2008). Application of the Tweedie distribution to zero-catch data in CPUE analysis. Fisheries Research 93(1-2) 154–162.
[54] 
Smyth, G. K. (1989). Generalized linear models with varying dispersion. Journal of the Royal Statistical Society: Series B (Methodological) 51(1) 47–60. MR0984992
[55] 
Smyth, G. K. and Jørgensen, B. (2002). Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin: The Journal of the IAA 32(1) 143–157. https://doi.org/10.2143/AST.32.1.1020. MR1930491
[56] 
Smyth, G. K. and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics: The official journal of the International Environmetrics Society 10(6) 695–709.
[57] 
Swallow, B., Buckland, S. T., King, R. and Toms, M. P. (2016). Bayesian hierarchical modelling of continuous non-negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter. Biometrical Journal 58(2) 357–371. https://doi.org/10.1002/bimj.201400081. MR3499119
[58] 
Tweedie, M. C. et al. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and new directions: Proc. Indian statistical institute golden Jubilee International conference 579 604. MR0786162
[59] 
Verbyla, A. P. (1993). Modelling variance heterogeneity: residual maximum likelihood and diagnostics. Journal of the Royal Statistical Society: Series B (Methodological) 55(2) 493–508. MR1224412
[60] 
Williams, C. K. and Rasmussen, C. E. (2006) Gaussian processes for machine learning 2. MIT press Cambridge, MA. MR2514435
[61] 
Wolpert, R. L. and Ickstadt, K. (1998). Poisson/gamma random field models for spatial statistics. Biometrika 85(2) 251–267. https://doi.org/10.1093/biomet/85.2.251. MR1649114
[62] 
Yang, Y., Qian, W. and Zou, H. (2018). Insurance premium prediction via gradient tree-boosted Tweedie compound Poisson models. Journal of Business & Economic Statistics 36(3) 456–470. https://doi.org/10.1080/07350015.2016.1200981. MR3828973
[63] 
Ye, T., Lachos, V. H., Wang, X. and Dey, D. K. (2021). Comparisons of zero-augmented continuous regression models from a Bayesian perspective. Statistics in Medicine 40(5) 1073–1100. https://doi.org/10.1002/sim.8795. MR4384363
[64] 
Zeger, S. L. and Karim, M. R. (1991). Generalized linear models with random effects; a Gibbs sampling approach. Journal of the American statistical association 86(413) 79–86. MR1137101
[65] 
Zhang, H. (2002). On estimation and prediction for spatial generalized linear mixed models. Biometrics 58(1) 129–136. https://doi.org/10.1111/j.0006-341X.2002.00129.x. MR1891051
[66] 
Zhang, Y. (2013). Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models. Statistics and Computing 23(6) 743–757. https://doi.org/10.1007/s11222-012-9343-7. MR3247830
[67] 
Zhou, H. and Hanson, T. (2015). Bayesian spatial survival models. Nonparametric Bayesian Inference in Biostatistics 215–246. MR3411022

Full article Related articles PDF XML
Full article Related articles PDF XML

Copyright
© 2023 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Bayesian Modeling Gaussian Process Hierarchical Spatial Process Models Spike and Slab Priors Tweedie Double Generalized Linear Models

Funding
Shariq Mohammed was supported by institutional research funds from Boston University (BU) School of Public Health and Rafik B. Hariri Institute for Computing and Computational Science & Engineering at BU.

Metrics
since December 2021
704

Article info
views

287

Full article
views

214

PDF
downloads

62

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy