Computationally Scalable Bayesian SPDE Modeling for Censored Spatial Responses
Pub. online: 4 March 2025
Type: Spatial And Environmental Statistics
Open Access
Accepted
24 January 2025
24 January 2025
Published
4 March 2025
4 March 2025
Abstract
Observations of groundwater pollutants, such as arsenic or Perfluorooctane sulfonate (PFOS), are riddled with left censoring. These measurements have an impact on the health and lifestyle of the populace. Left censoring of these spatially correlated observations is usually addressed by applying Gaussian processes (GPs), which have theoretical advantages. However, this comes with a challenging computational complexity of $\mathcal{O}({n^{3}})$, impractical for large datasets. Additionally, a sizable proportion of the left-censored data creates further bottlenecks since the likelihood computation now involves an intractable high-dimensional integral of the multivariate Gaussian density. In this article, we tackle these two problems simultaneously by approximating the GP with a Gaussian Markov random field (GMRF) approach that exploits an explicit link between a GP with Matérn correlation function and a GMRF using stochastic partial differential equations (SPDEs). We introduce a GMRF-based measurement error into the model, which alleviates the likelihood computation for the censored data, drastically improving the computational speed while maintaining admirable accuracy. Our approach demonstrates robustness and substantial computational scalability compared to state-of-the-art methods for censored spatial responses across various simulation settings. Finally, the fit of this fully Bayesian model to the concentration of PFOS in groundwater available at 24,959 sites across California, where 46.62% responses are censored, produces prediction surface and uncertainty quantification in real-time, thereby substantiating the applicability and scalability of the proposed method. Code for implementation is made available via GitHub.
References
Abrahamsen, P. and Benth, F. E. (2001). Kriging with inequality constraints. Mathematical Geology 33(6) 719–744. https://doi.org/10.1023/A:1011078716252. MR1956391
Bakka, H., Rue, H., Fuglstad, G.-A., Riebler, A., Bolin, D., Illian, J., Krainski, E., Simpson, D. and Lindgren, F. (2018). Spatial modeling with R-INLA: A review. Wiley Interdisciplinary Reviews: Computational Statistics 10(6) 1443. https://doi.org/10.1002/wics.1443. MR3873676
Bivand, R. S., Pebesma, E. J., Gómez-Rubio, V. and Pebesma, E. J. (2008) Applied Spatial Data Analysis with R. 747248717. Springer. https://doi.org/10.1007/978-1-4614-7618-4. MR3099410
Bolin, D. and Kirchner, K. (2020). The rational SPDE approach for Gaussian random fields with general smoothness. Journal of Computational and Graphical Statistics 29(2) 274–285. https://doi.org/10.1080/10618600.2019.1665537. MR4116041
Bolin, D., Simas, A. B. and Xiong, Z. (2024). Covariance–based rational approximations of fractional SPDEs for computationally efficient Bayesian inference. Journal of Computational and Graphical Statistics 33(1) 64–74. https://doi.org/10.1080/10618600.2023.2231051. MR4713943
Borouchaki, H. and Lo, S. (1995). Fast Delaunay triangulation in three dimensions. Computer Methods in Applied Mechanics and Engineering 128(1-2) 153–167. https://doi.org/10.1016/0045-7825(95)00854-1. MR1376908
Ciarlet, P. G. (2002) The Finite Element Method for Elliptic Problems. SIAM. https://doi.org/10.1137/1.9780898719208. MR1930132
Cisneros, D., Gong, Y., Yadav, R., Hazra, A. and Huser, R. (2023). A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes. Extremes 26(2) 301–330. https://doi.org/10.1007/s10687-022-00460-8. MR4577409
De Oliveira, V. (2005). Bayesian inference and prediction of Gaussian random fields based on censored data. Journal of Computational and Graphical Statistics 14(1) 95–115. https://doi.org/10.1198/106186005X27518. MR2137892
Delyon, B., Lavielle, M. and Moulines, E. (1999). Convergence of a stochastic approximation version of the EM algorithm. Annals of Statistics 27(1) 94–128. https://doi.org/10.1214/aos/1018031103. MR1701103
Fridley, B. L. and Dixon, P. (2007). Data augmentation for a Bayesian spatial model involving censored observations. Environmetrics: The Official Journal of the International Environmetrics Society 18(2) 107–123. https://doi.org/10.1002/env.806. MR2345649
Gelfand, A. E. and Schliep, E. M. (2016). Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics 18 86–104. https://doi.org/10.1016/j.spasta.2016.03.006. MR3573271
Gelfand, A. E., Kottas, A. and MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association 100(471) 1021–1035. https://doi.org/10.1198/016214504000002078. MR2201028
Hazra, A., Reich, B. J., Shaby, B. A. and Staicu, A.-M. (2018). A semiparametric spatiotemporal Bayesian model for the bulk and extremes of the Fosberg Fire Weather Index. arXiv preprint arXiv:1812.11699.
Hopke, P. K., Liu, C. and Rubin, D. B. (2001). Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic. Biometrics 57(1) 22–33. https://doi.org/10.1111/j.0006-341X.2001.00022.x. MR1833288
Hosmer, D. W., Lemeshow, S. and May, S. (2008). Applied Survival Analysis. Wiley Series in Probability and Statistics 60. https://doi.org/10.1002/9780470258019. MR2383788
Lindgren, F., Bolin, D. and Rue, H. (2022). The SPDE approach for Gaussian and non-Gaussian fields: 10 years and still running. Spatial Statistics 50 100599. https://doi.org/10.1016/j.spasta.2022.100599. MR4439328
Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society Series B: Statistical Methodology 73(4) 423–498. https://doi.org/10.1111/j.1467-9868.2011.00777.x. MR2853727
Moran, J. E., Hudson, G. B., Eaton, G. F. and Leif, R. (2005). California GAMA Program: Groundwater Ambient Monitoring and Assessment Results for the Sacramento Valley and Volcanic Provinces of Northern California. Technical Report, Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States).
Ordoñez, J. A., Bandyopadhyay, D., Lachos, V. H. and Cabral, C. R. (2018). Geostatistical estimation and prediction for censored responses. Spatial Statistics 23 109–123. https://doi.org/10.1016/j.spasta.2017.12.001. MR3768178
Rue, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications. CRC Press. https://doi.org/10.1201/9780203492024. MR2130347
Sahoo, I. and Hazra, A. (2021). Contamination mapping in Bangladesh using a multivariate spatial Bayesian model for left-censored data. Journal of the Indian Statistical Association 59(2) 251–285. MR4810704
Sahoo, I., Guinness, J. and Reich, B. J. (2023). Estimating atmospheric motion winds from satellite image data using space-time drift models. Environmetrics 34(8) 2818. https://doi.org/10.1002/env.2818. MR4680778
Schelin, L. and Sjöstedt-de Luna, S. (2014). Spatial prediction in the presence of left-censoring. Computational Statistics & Data Analysis 74 125–141. https://doi.org/10.1016/j.csda.2014.01.004. MR3168965
Schulz, E., Speekenbrink, M. and Krause, A. (2018). A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology 85 1–16. https://doi.org/10.1016/j.jmp.2018.03.001. MR3852577
Smalling, K. L., Romanok, K. M., Bradley, P. M., Morriss, M. C., Gray, J. L., Kanagy, L. K., Gordon, S. E., Williams, B. M., Breitmeyer, S. E., Jones, D. K. et al. (2023). Per-and polyfluoroalkyl substances (PFAS) in United States tapwater: Comparison of underserved private-well and public-supply exposures and associated health implications. Environment International, 108033.
Tadayon, V. (2017). Bayesian analysis of censored spatial data based on a non-Gaussian model. arXiv preprint arXiv:1706.05717.
Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology 50(2) 297–312. MR0964183
Wiens, A., Nychka, D. and Kleiber, W. (2020). Modeling spatial data using local likelihood estimation and a Matérn to spatial autoregressive translation. Environmetrics 31(6) 2652. https://doi.org/10.1002/env.2652. MR4151871
Yadav, R., Huser, R. and Opitz, T. (2019). Spatial hierarchical modeling of threshold exceedances using rate mixtures. Environmetrics, 2662. https://doi.org/10.1002/env.2662. MR4248739
Zhang, L., Shaby, B. A. and Wadsworth, J. L. (2021). Hierarchical transformed scale mixtures for flexible modeling of spatial extremes on datasets with many locations. Journal of the American Statistical Association 1–13. https://doi.org/10.1080/01621459.2020.1858838. MR4480717