The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. To appear
  3. Computationally Scalable Bayesian SPDE M ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Computationally Scalable Bayesian SPDE Modeling for Censored Spatial Responses
Indranil Sahoo   Suman Majumder   Arnab Hazra     All authors (5)

Authors

 
Placeholder
https://doi.org/10.51387/25-NEJSDS78
Pub. online: 4 March 2025      Type: Methodology Article      Open accessOpen Access
Area: Spatial and Environmental Statistics

Accepted
24 January 2025
Published
4 March 2025

Abstract

Observations of groundwater pollutants, such as arsenic or Perfluorooctane sulfonate (PFOS), are riddled with left censoring. These measurements have an impact on the health and lifestyle of the populace. Left censoring of these spatially correlated observations is usually addressed by applying Gaussian processes (GPs), which have theoretical advantages. However, this comes with a challenging computational complexity of $\mathcal{O}({n^{3}})$, impractical for large datasets. Additionally, a sizable proportion of the left-censored data creates further bottlenecks since the likelihood computation now involves an intractable high-dimensional integral of the multivariate Gaussian density. In this article, we tackle these two problems simultaneously by approximating the GP with a Gaussian Markov random field (GMRF) approach that exploits an explicit link between a GP with Matérn correlation function and a GMRF using stochastic partial differential equations (SPDEs). We introduce a GMRF-based measurement error into the model, which alleviates the likelihood computation for the censored data, drastically improving the computational speed while maintaining admirable accuracy. Our approach demonstrates robustness and substantial computational scalability compared to state-of-the-art methods for censored spatial responses across various simulation settings. Finally, the fit of this fully Bayesian model to the concentration of PFOS in groundwater available at 24,959 sites across California, where 46.62% responses are censored, produces prediction surface and uncertainty quantification in real-time, thereby substantiating the applicability and scalability of the proposed method. Code for implementation is made available via GitHub.

References

[1] 
Abrahamsen, P. and Benth, F. E. (2001). Kriging with inequality constraints. Mathematical Geology 33(6) 719–744. https://doi.org/10.1023/A:1011078716252. MR1956391
[2] 
Andrews, D. Q. and Naidenko, O. V. (2020). Population-wide exposure to per-and polyfluoroalkyl substances from drinking water in the United States. Environmental Science & Technology Letters 7(12) 931–936.
[3] 
Bakka, H., Rue, H., Fuglstad, G.-A., Riebler, A., Bolin, D., Illian, J., Krainski, E., Simpson, D. and Lindgren, F. (2018). Spatial modeling with R-INLA: A review. Wiley Interdisciplinary Reviews: Computational Statistics 10(6) 1443. https://doi.org/10.1002/wics.1443. MR3873676
[4] 
Bivand, R. S., Pebesma, E. J., Gómez-Rubio, V. and Pebesma, E. J. (2008) Applied Spatial Data Analysis with R. 747248717. Springer. https://doi.org/10.1007/978-1-4614-7618-4. MR3099410
[5] 
Bolin, D. and Kirchner, K. (2020). The rational SPDE approach for Gaussian random fields with general smoothness. Journal of Computational and Graphical Statistics 29(2) 274–285. https://doi.org/10.1080/10618600.2019.1665537. MR4116041
[6] 
Bolin, D., Simas, A. B. and Xiong, Z. (2024). Covariance–based rational approximations of fractional SPDEs for computationally efficient Bayesian inference. Journal of Computational and Graphical Statistics 33(1) 64–74. https://doi.org/10.1080/10618600.2023.2231051. MR4713943
[7] 
Borouchaki, H. and Lo, S. (1995). Fast Delaunay triangulation in three dimensions. Computer Methods in Applied Mechanics and Engineering 128(1-2) 153–167. https://doi.org/10.1016/0045-7825(95)00854-1. MR1376908
[8] 
Ciarlet, P. G. (2002) The Finite Element Method for Elliptic Problems. SIAM. https://doi.org/10.1137/1.9780898719208. MR1930132
[9] 
Cisneros, D., Gong, Y., Yadav, R., Hazra, A. and Huser, R. (2023). A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes. Extremes 26(2) 301–330. https://doi.org/10.1007/s10687-022-00460-8. MR4577409
[10] 
De Oliveira, V. (2005). Bayesian inference and prediction of Gaussian random fields based on censored data. Journal of Computational and Graphical Statistics 14(1) 95–115. https://doi.org/10.1198/106186005X27518. MR2137892
[11] 
Delyon, B., Lavielle, M. and Moulines, E. (1999). Convergence of a stochastic approximation version of the EM algorithm. Annals of Statistics 27(1) 94–128. https://doi.org/10.1214/aos/1018031103. MR1701103
[12] 
for Toxic Substances, A. and (ATSDR), D. R. (2018). Toxicological profile for perfluoroalkyls (draft for public comment). US Department of Health and Human Services, Public Health Service, Atlanta, GA.
[13] 
Fridley, B. L. and Dixon, P. (2007). Data augmentation for a Bayesian spatial model involving censored observations. Environmetrics: The Official Journal of the International Environmetrics Society 18(2) 107–123. https://doi.org/10.1002/env.806. MR2345649
[14] 
Gelfand, A. E. and Schliep, E. M. (2016). Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics 18 86–104. https://doi.org/10.1016/j.spasta.2016.03.006. MR3573271
[15] 
Gelfand, A. E., Kottas, A. and MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association 100(471) 1021–1035. https://doi.org/10.1198/016214504000002078. MR2201028
[16] 
Hazra, A., Huser, R. and Bolin, D. (2024). Efficient Modeling of Spatial Extremes over Large Geographical Domains. Journal of Computational and Graphical Statistics. Just accepted.
[17] 
Hazra, A., Reich, B. J., Shaby, B. A. and Staicu, A.-M. (2018). A semiparametric spatiotemporal Bayesian model for the bulk and extremes of the Fosberg Fire Weather Index. arXiv preprint arXiv:1812.11699.
[18] 
Hepburn, E., Madden, C., Szabo, D., Coggan, T. L., Clarke, B. and Currell, M. (2019). Contamination of groundwater with per-and polyfluoroalkyl substances (PFAS) from legacy landfills in an urban re-development precinct. Environmental Pollution 248 101–113.
[19] 
Hopke, P. K., Liu, C. and Rubin, D. B. (2001). Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic. Biometrics 57(1) 22–33. https://doi.org/10.1111/j.0006-341X.2001.00022.x. MR1833288
[20] 
Hosmer, D. W., Lemeshow, S. and May, S. (2008). Applied Survival Analysis. Wiley Series in Probability and Statistics 60. https://doi.org/10.1002/9780470258019. MR2383788
[21] 
Krainski, E., Gómez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo, D., Simpson, D., Lindgren, F. and Rue, H. (2018) Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. Chapman and Hall/CRC.
[22] 
Lindgren, F. and Rue, H. (2015). Bayesian spatial modelling with R-INLA. Journal of Statistical Software 63(19).
[23] 
Lindgren, F., Bolin, D. and Rue, H. (2022). The SPDE approach for Gaussian and non-Gaussian fields: 10 years and still running. Spatial Statistics 50 100599. https://doi.org/10.1016/j.spasta.2022.100599. MR4439328
[24] 
Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society Series B: Statistical Methodology 73(4) 423–498. https://doi.org/10.1111/j.1467-9868.2011.00777.x. MR2853727
[25] 
Militino, A. F. and Ugarte, M. D. (1999). Analyzing censored spatial data. Mathematical Geology 31(5) 551–561.
[26] 
Moran, J. E., Hudson, G. B., Eaton, G. F. and Leif, R. (2005). California GAMA Program: Groundwater Ambient Monitoring and Assessment Results for the Sacramento Valley and Volcanic Provinces of Northern California. Technical Report, Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States).
[27] 
Muigai, B. USC Researchers Assess Impact of PFAS in Drinking Water Systems in Southern California. Keck School of Medicine of USC Newsroom. Accessed 2024-09-10.
[28] 
Ordoñez, J. A., Bandyopadhyay, D., Lachos, V. H. and Cabral, C. R. (2018). Geostatistical estimation and prediction for censored responses. Spatial Statistics 23 109–123. https://doi.org/10.1016/j.spasta.2017.12.001. MR3768178
[29] 
Pineda, D. Risk of tap water exposure to toxic PFAS chemicals higher in Southern California. Los Angeles Times. Accessed 2024-09-10.
[30] 
Rathbun, S. L. (2006). Spatial prediction with left-censored observations. Journal of Agricultural, Biological, and Environmental Statistics 11(3) 317–336.
[31] 
Read, R., Briefs, C. B. and Cases, C. C. C. (2024). PFAS National Primary Drinking Water Regulation 89 Fed. Reg. 32532 (Apr. 26, 2024) Copy Cite. Federal Register.
[32] 
Rosen, G. (1955). Problems in the application of statistical analysis to questions of health: 1700–1880. Bulletin of the History of Medicine 29(1) 27–45.
[33] 
Rue, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications. CRC Press. https://doi.org/10.1201/9780203492024. MR2130347
[34] 
Sahoo, I. and Hazra, A. (2021). Contamination mapping in Bangladesh using a multivariate spatial Bayesian model for left-censored data. Journal of the Indian Statistical Association 59(2) 251–285. MR4810704
[35] 
Sahoo, I., Guinness, J. and Reich, B. J. (2023). Estimating atmospheric motion winds from satellite image data using space-time drift models. Environmetrics 34(8) 2818. https://doi.org/10.1002/env.2818. MR4680778
[36] 
Schelin, L. and Sjöstedt-de Luna, S. (2014). Spatial prediction in the presence of left-censoring. Computational Statistics & Data Analysis 74 125–141. https://doi.org/10.1016/j.csda.2014.01.004. MR3168965
[37] 
Schulz, E., Speekenbrink, M. and Krause, A. (2018). A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology 85 1–16. https://doi.org/10.1016/j.jmp.2018.03.001. MR3852577
[38] 
Sedda, L., Atkinson, P. M., Barca, E. and Passarella, G. (2012). Imputing censored data with desirable spatial covariance function properties using simulated annealing. Journal of Geographical Systems 14(3) 265–282.
[39] 
Smalling, K. L., Romanok, K. M., Bradley, P. M., Morriss, M. C., Gray, J. L., Kanagy, L. K., Gordon, S. E., Williams, B. M., Breitmeyer, S. E., Jones, D. K. et al. (2023). Per-and polyfluoroalkyl substances (PFAS) in United States tapwater: Comparison of underserved private-well and public-supply exposures and associated health implications. Environment International, 108033.
[40] 
Stein, M. L. (1992). Prediction and inference for truncated spatial data. Journal of Computational and Graphical Statistics 1(1) 91–110.
[41] 
Tadayon, V. (2017). Bayesian analysis of censored spatial data based on a non-Gaussian model. arXiv preprint arXiv:1706.05717.
[42] 
Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology 50(2) 297–312. MR0964183
[43] 
Wang, Z., DeWitt, J. C., Higgins, C. P. and Cousins, I. T. (2017). A Never-Ending Story of Per- and Polyfluoroalkyl Substances (PFASs)? ACS Publications.
[44] 
Wiens, A., Nychka, D. and Kleiber, W. (2020). Modeling spatial data using local likelihood estimation and a Matérn to spatial autoregressive translation. Environmetrics 31(6) 2652. https://doi.org/10.1002/env.2652. MR4151871
[45] 
Yadav, R., Huser, R. and Opitz, T. (2019). Spatial hierarchical modeling of threshold exceedances using rate mixtures. Environmetrics, 2662. https://doi.org/10.1002/env.2662. MR4248739
[46] 
Zhang, L., Shaby, B. A. and Wadsworth, J. L. (2021). Hierarchical transformed scale mixtures for flexible modeling of spatial extremes on datasets with many locations. Journal of the American Statistical Association 1–13. https://doi.org/10.1080/01621459.2020.1858838. MR4480717

Full article PDF XML
Full article PDF XML

Copyright
© 2025 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Censored high-dimensional spatial data Gaussian Markov random field Markov chain Monte Carlo Measurement error model Perfluorooctane sulfonate Stochastic partial differential equation

Metrics
since December 2021
88

Article info
views

40

Full article
views

27

PDF
downloads

10

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy