Modeling Multivariate Spatial Dependencies Using Graphical Models

Dey, Debangan; Datta, Abhirup; Banerjee, Sudipto

doi:10.51387/23-NEJSDS47

The New England Journal of Statistics in Data Science

Modeling Multivariate Spatial Dependencies Using Graphical Models

Volume 1, Issue 2 (2023), pp. 283–295

Debangan Dey Abhirup Datta Sudipto Banerjee

https://doi.org/10.51387/23-NEJSDS47

Pub. online: 6 September 2023 Type: Methodology Article

Open Access

Area: Spatial and Environmental Statistics

Accepted
30 May 2023

Published
6 September 2023

Abstract

Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes. While spatial factor models and multivariate basis expansions occupy a prominent place in this domain, this article elucidates a recent approach, graphical Gaussian Processes, that exploits the notion of conditional independence among a very large number of spatial processes to build scalable graphical models for fully model-based Bayesian analysis of multivariate spatial data.

1 Introduction

Multivariate spatial data consist of multiple variables that exhibit inherent associations among themselves as well as underlying spatial associations unique to each of them. While modeling each variable separately captures its spatial distribution independent of other variables, the resulting data analysis fails to account for associations among the variables, which can substantially impair prediction or interpolation [see, e.g., 9, 11, 21, 50]. Joint modeling of multivariate spatial data proceeds from vector-valued latent spatial stochastic processes or random fields, such as a multivariate Gaussian process. These are specified with matrix-valued cross-covariance functions [see, e.g., 24, 43, 33, and references therein] that model pairwise associations at distinct locations. While theoretical characterizations of cross-covariance functions are well established, modeling implications and practicability depend upon the specific application [see, e.g. 5, 34, 46, 32, 20, 44].

Multivariate spatial processes are customarily specified using a vector of mean functions and a cross-covariance matrix function. Let $z(s)={({z_{1}}(s),\dots ,{z_{q}}(s))^{\mathrm{T}}}$ be a $q\times 1$ stochastic process, where each ${z_{i}}(s)$ is a real-valued random variable at location $s\in \mathcal{D}\subseteq {\mathrm{\Re }^{d}}$. The process is specified by its means $\text{E}[{z_{i}}(s)]={\mu _{i}}(s)$ and the second-order covariance functions ${C_{ij}}(s,{s^{\prime }})=\text{Cov}\{{z_{i}}(s),{z_{j}}({s^{\prime }})\}$ for $s,{s^{\prime }}\in {\mathrm{\Re }^{d}}$ and $i,j=1,2,\dots ,q$. These covariances define the matrix-valued $q\times q$ cross-covariance function $C(s,{s^{\prime }})=\{{C_{ij}}(s,{s^{\prime }})\}$ with $(i,j)$-th entry ${C_{ij}}(s,{s^{\prime }})$. Since the mean structure can be estimated by absorbing it into a regression component, i.e., ${z_{i}}(s)={\mu _{i}}(s)+{w_{i}}(s)$, where ${w_{i}}(s)$ has zero mean, we focus on constructing a valid cross-covariance function for a zero-mean latent process. From its definition, $C(s,{s^{\prime }})$ need not be symmetric, but must satisfy $C{(s,{s^{\prime }})^{\mathrm{T}}}=C({s^{\prime }},s)$. Also, since $\text{var}\{{\textstyle\sum _{i}^{n}}{a_{i}^{\mathrm{T}}}z({s_{i}})\}\gt 0$ for any finite set of distinct locations ${s_{1}},{s_{2}},\dots ,{s_{n}}\in \mathcal{D}$ and any set of nonzero constant vectors ${a_{1}},{a_{2}},\dots ,{a_{n}}\in {\mathrm{\Re }^{q}}\setminus \{0\}$, we have ${\textstyle\sum _{i=1}^{n}}{\textstyle\sum _{j=1}^{n}}{a_{i}^{\mathrm{T}}}C({s_{i}},{s_{j}}){a_{i}}\gt 0$. See [24] for a review.

Spatial factor models [35, 40, 47, 53] are among the most widely used approaches to build multivariate spatial models when the number of spatially dependent variables is large. These models build upon the popular linear model of coregionalization (LMC) [26, 44], which specifies the multivariate random field as a linear combination of r univariate random fields. Choosing $r\ll q$ produces low-rank or spatial factor models. Computational benefits ensue; one needs specify only a small number r of univariate processes to ensure non-negative definiteness and, hence, yields large q. While computationally convenient, multivariate spatial analysis using LMC prohibits interpretation of spatial structures of each variable. For example, LMC endows each ${z_{j}}(s)$ with the same smoothness (the smoothness of the roughest latent process). This is implausible in most applications because different spatial variables typically exhibit very different degrees of smoothness. An alternate approach constructs cross-covariances by convolving univariate processes with kernel functions [48, 49, 36]. However, barring certain very special cases, the resulting cross-covariance functions are analytically intractable, hence less interpretable, and may require cumbersome numerical integration for estimating process parameters. Some of the aforementioned difficulties are obviated by a conditional approach developed in [12], where the univariate GPs are specified sequentially, conditional on the previous GPs assuming some ordering of the q variables. Other notable approaches include [1] who considered using latent dimensions to embed all the variables in a larger dimensional space and use standard covariance functions on this augmented space. However, this embedding restricts all pairwise-correlations among the different variables to be positive. [25] and [2] directly formulated multivariate Matérn cross-covariance functions where both the univariate covariance functions for each variable and the cross-covariance functions between each pair of variables are members of the Matérn family. The multivariate Matérn GP is appealing in terms of interpretability, allowing direct spatial inference for each variable via estimates of the parameters of the corresponding component Matérn GP.

While such methods have been applied to diverse data sets, they have been restricted to a moderate number of outcomes. This article focuses on the high-dimensional multivariate setting where a significantly larger number of outcomes (tens to hundreds of variables) can be measured at each spatial location. Such settings are becoming increasingly commonplace in the environmental and physical sciences where inference is sought on large numbers of dependent outcomes. [17] developed a novel class of “Graphical Gaussian Process” (GGP) models for scalable and interpretable analysis of high-dimensional multivariate spatial data. The underlying idea is to exploit conditional independence among the variables using an undirected graph with the components of the multivariate GP as the nodes. Absence of edges between nodes represent conditional independence among the corresponding pair of component processes given all the other nodes.

Graphical models are extensively used to represent the joint distribution of several variables for many types of non-spatial data. Applications in spatial settings have been concerned with reducing the computational burden for large n by replacing the complete graph between locations with nearest-neighbor graphs [14]. Current multivariate GP covariance functions do not lend themselves naturally to incorporating an inter-variable graphical model. [30] proposed a method for parsimonious joint analysis of such high-dimensional multivariate spatial data via a common basis function expansion and imposing sparse graphical models for estimating the covariances of the coefficient vectors. This approach, akin to similar approaches common in multivariate functional modeling [52, 54], rely on multiple replicate measurements of each variable at each location which may not be available in many multivariate spatial applications.

The approach of Dey et al. [17] constructs a high-dimensional multivariate model by adapting graphical Gaussian models to process-based settings and does not require replicate data. This process-level graphical model approach addresses some key properties of multivariate GPs that are deemed critical for handling highly multivariate data, including the retention of the flexibility and interpretation of spatial and non-spatial dependencies. The balance of the manuscript proceeds as follows. We offer a brief overview of inference from graphical models in spatial statistics followed by an elucidation of the GGP and its implementation in a fully Bayesian modeling framework. We illustrate with some simulation examples, offer an analysis for a highly multivariate environmental data, and conclude with some discussion and pointers for future research.

2 Graphical Models for Spatial Data Analysis

Undirected graphical models have an established history in spatial statistics in the context of specifying spatial dependencies via Markov random fields (MRF) for regionally aggregated or areal data [see, e.g., 6, 7, 10, 41, 22, 5, and references therein]. In areal modeling [37, 23, 28, 27] the nodes of the graph represent regions and the presence of an edge between two nodes indicates that the two regions are neighbors (or adjacent). Spatial models are constructed from this graph by assuming dependence between nodes with edges between them. For example, the popular conditional auto-regression (CAR) models assume conditional independence between two nonadjacent nodes given all other nodes.

Our current development departs from the multivariate areal setting in two aspects. First, we consider point-referenced settings where inference is desired at the process-level. Unlike areal settings where inference is limited to the fixed set of areas, point-referenced datasets allow inference about the variables over every conceivable location in an entire region. Second, unlike MRFs where the graph is on the set of areas specifying spatial dependencies, here the undirected graph posits inter-variable conditional independence relationships. Thus each node corresponds to the stochastic process describing the distribution of the respective variable over space.

In point-referenced settings, graphical models among locations based on spatial proximity have been used to specify nearest neighbor Gaussian Processes (NNGP) [14, 18] that offer a scalable solution to use of GP priors in Bayesian hierarchical models. However, akin to the MRF models for areal data, NNGP uses a graph among the locations to parsimoniously specify spatial dependence. Neither approaches model inter-variable relationships using graphs.

Recent developments for graphical modeling of multivariate functional or spatial data represents the multiple variable processes using a common univariate basis function expansion with multivariate (vector-valued) coefficients [52, 30]. Graphical modeling among the variables is induced by a series of graphical models, one for each of the vector-valued coefficients. As practically, basis functions are truncated to a finite number of coefficients, the representation reduces to the standard graphical model estimation for vector-valued data using graphical Lasso type techniques [19]. The advantage of the approach is that the graph is allowed to vary for each coefficient and thereby allowing conditional independence structures to be resolution-specific. However, these methods rely on replicate data on each variable at each location which is atypical in many spatial settings. Also, the assumption of a common univariate basis function expansion for the multivariate process may be inadequate.

We first present a simple alternative approach to build a multivariate spatial model that explicitly respects a given inter-variable graphical model. Let $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ be an undirected graph with $\mathcal{V}$ being the set of vertices and $\mathcal{E}$ the set of edges. A customary approach for developing probabilistic models on such undirected graphs is to specify full conditional distributions for each node given others and then deriving a joint density from the set of full conditional distributions. This is achieved using Brook’s Lemma [6, 37], which can be adapted to our setting as follows.

Let us consider a multivariate spatial process $z(s)$ with q univariate spatial processes and let ${z_{i}}={({z_{i}}({s_{i1}}),{z_{i}}({s_{i2}}),\dots ,{z_{i{n_{i}}}}({s_{i{n_{i}}}}))^{\mathrm{T}}}$ be an ${n_{i}}\times 1$ random vector corresponding to the realizations of the i-th variable, ${z_{i}}(s)$, for each $i=1,2,\dots ,q$. We specify the following sequence of full conditional distributions for modeling ${z_{i}}$’s given all other ${z_{j}}$’s:

(2.1)

\[ {z_{i}}\hspace{0.1667em}|\hspace{0.1667em}{z_{(-i)}}\sim N\left({\sum \limits_{j=1}^{q}}{A_{ij}}{z_{j}},{\Gamma _{i}}\right)\hspace{0.2778em},\hspace{1em}i=1,2,\dots ,q\hspace{0.2778em},\]

where ${z_{-(i)}}=\{{z_{1}},{z_{2}},\dots ,{z_{q}}\}\setminus \{{z_{i}}\}$, i.e., the collection of ${z_{j}}$’s for $j=1,2,\dots ,q$ but excluding ${z_{i}}$, ${A_{ij}}$’s are fixed ${n_{i}}\times {n_{j}}$ matrices, ${A_{ii}}=O$ (the matrix of zeroes), and ${\Gamma _{i}}$’s are fixed positive definite matrices. Since each variable can exhibit its own spatial dependence structure, the ${\Gamma _{i}}$ varies by variable. Brook’s Lemma provides a straightforward method for deriving the joint density from (2.1) using the identity:

(2.2)

\[\begin{array}{cc}& \displaystyle \pi ({z_{1}},{z_{2}},\dots ,{z_{q}})={\prod \limits_{i=1}^{q}}\frac{\pi ({z_{i}}\hspace{0.1667em}|\hspace{0.1667em}{\tilde{z}_{1}},\dots ,{\tilde{z}_{i-1}},{z_{i+1}},\dots ,{z_{q}})}{\pi ({\tilde{z}_{i}}\hspace{0.1667em}|\hspace{0.1667em}{\tilde{z}_{1}},\dots ,{\tilde{z}_{i-1}},{z_{i+1}},\dots ,{z_{q}}))}\\ {} & \displaystyle \times \pi ({\tilde{z}_{1}},{\tilde{z}_{2}},\dots ,{\tilde{z}_{q}})\hspace{0.2778em},\end{array}\]

where $\tilde{z}={({\tilde{z}_{1}^{\mathrm{T}}},{\tilde{z}_{2}^{\mathrm{T}}},\dots ,{\tilde{z}_{q}^{\mathrm{T}}})^{\mathrm{T}}}$ is any fixed point in the support of $\pi (z)$ and we assume that the joint density $\pi (\cdot )\gt 0$ over its entire support. The proof is a straightforward verification proceeding from the last element of the right hand side (i.e., the joint density on the right hand side). Note that

Proceeding as above will continue to replace ${\tilde{z}_{i}}$ with ${z_{i}}$ in the joint density on the right hand side of (2.2) for each $i=q-1,q-2,\dots ,1$ and we eventually arrive at $\pi ({z_{1}},{z_{2}},\dots ,{z_{q}})$.

Applying (2.2) to the full conditional distributions in (2.1) with ${\tilde{z}_{i}}=0$ for each $i=1,\dots ,q$ yields the joint density

(2.3)

\[ \begin{aligned}{}\pi (z)& \propto \exp \left\{-\frac{1}{2}\left({\sum \limits_{i=1}^{q}}{z_{i}^{\mathrm{T}}}{\Gamma _{i}^{-1}}{z_{i}}-{\sum \limits_{i=1}^{q}}{\sum \limits_{j\ne i}^{q}}{z_{i}^{\mathrm{T}}}{\Gamma _{i}^{-1}}{A_{ij}}{z_{j}}\right)\right\}\\ {} & \propto \exp \left(-\frac{1}{2}{z^{\mathrm{T}}}Qz\right)\hspace{0.2778em},\end{aligned}\]

where $z={({z_{1}^{\mathrm{T}}},\dots ,{z_{q}^{\mathrm{T}}})^{\mathrm{T}}}$ is $({\textstyle\sum _{i=1}^{q}}{n_{i}})\times 1$ and $Q={M^{-1}}(I-A)$ is $({\textstyle\sum _{i=1}^{q}}{n_{i}})\times ({\textstyle\sum _{i=1}^{q}}{n_{i}})$ with $M=\oplus {\Gamma _{i}}$ is block-diagonal with $(i,i)$-th block ${\Gamma _{i}}$ for $i=1,\dots ,q$ and $A=({A_{ij}})$ is the $({\textstyle\sum _{i=1}^{q}}{n_{i}})\times ({\textstyle\sum _{i=1}^{q}}{n_{i}})$ block matrix with ${A_{ij}}$ as the $(i,j)$th block. For (2.3) to be a valid density, Q needs to be symmetric and positive definite and $z\sim N(0,{Q^{-1}})$ in (2.3).

How, then, can we construct Q to be symmetric and positive definite while also respecting the conditional independence relationships among the outcomes from a given undirected graph? To be precise, if two distinct nodes i and j in the graph do not have an edge, then we must ensure that ${z_{i}}\perp {z_{j}}\hspace{0.1667em}|\hspace{0.1667em}{z_{-(i,j)}}$ or, equivalently, the ${n_{i}}\times {n_{j}}$ block submatrix of the precision ${Q_{ij}}=-{\Gamma _{i}^{-1}}{A_{ij}}=O$, i.e., the $(i,j)$-th block of Q must be an ${n_{i}}\times {n_{j}}$ matrix of zeros. Note that since ${A_{ii}}=O$ in (2.1), the $(i,i)$-th block of Q is ${\Gamma _{i}^{-1}}$.

Given the inter-variable graph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ let $\Lambda =({\lambda _{ij}})=D-\rho W$ be the $q\times q$ graph Laplacian, where $W=({w_{ij}})$ is the adjacency matrix with nonzero ${w_{ij}}$ only if there is an edge between i and j, $D=({d_{ii}})$ is a $q\times q$ diagonal matrix with the sum of each row of W along the diagonal, i.e., ${d_{ii}}={\textstyle\sum _{j=1}^{q}}{w_{ij}}$, and ρ is a scalar parameter that ensures positive-definiteness of Λ as long as $\rho \in (1/{\zeta _{min}},{\zeta _{max}})$, where ${\zeta _{min}}$ and ${\zeta _{max}}$ are the minimum and maximum eigenvalues of ${D^{-1/2}}W{D^{-1/2}}$, respectively. Since each node corresponds to the realizations of a spatial process, which is modeled as a latent (unobserved) process, we can assume without much loss of generality that each ${z_{i}}$ is $n\times 1$, i.e. ${n_{i}}=n$ for each $i=1,\dots ,q$.

Let ${R_{i}}$ be the $n\times n$ upper triangular factor in the Cholesky decomposition of the positive definite covariance matrix of ${z_{i}}$, i.e., $\text{var}({z_{i}})={C_{ii}}={R_{i}^{\mathrm{T}}}{R_{i}}$ for each $i=1,2,\dots ,q$. We now set ${\Gamma _{i}^{-1}}={\lambda _{ii}}{R_{i}^{\mathrm{T}}}{R_{i}}$ and ${A_{ij}}=-{\lambda _{ij}}{R_{i}^{-1}}{R_{j}}$ in (2.1). Then ${Q_{ij}}={\lambda _{ij}}{R_{i}^{\mathrm{T}}}{R_{j}}$ and $Q={\tilde{R}^{\mathrm{T}}}(\Lambda \otimes I)\tilde{R}$, where $\tilde{R}={\oplus _{i=1}^{q}}{R_{i}}$. Since each ${R_{i}}$ is nonsingular and Λ is positive definite, it follows that Q is positive definite and $\det (Q)={\textstyle\prod _{i=1}^{q}}{(\det ({R_{i}}))^{2}}{(\det (\Lambda ))^{n}}$. The special case where every variable has the same spatial covariance function so that ${R_{i}}=R$ for $i=1,2,\dots ,q$ yields the separable model $Q=\Lambda \otimes ({R^{\mathrm{T}}}R)$. We accrue computational benefits by modeling the spatial process so that the ${R_{i}}$’s are easily computed for massive data [see, e.g., 14, 4, 29, 39, and references therein]. Other, even more general, structures for Q can also be derived by adapting multivariate Markov random fields [27] to incorporate spatial processes into the nodes of a graph.

The above construction yields proper densities in (2.3) that will conform to conditional independence among the variables represented by the nodes of a posited undirected graph and can also be adapted for analyzing high-dimensional multivariate spatial data, where at least one of or both n and q are large. However, an important drawback is that it will be challenging to characterize the marginal distributions of such a multivariate process parsimoniously in terms of a few parameters. To elucidate, even if ${\Gamma _{i}}$ is chosen to be from an interpretable parametric family of covariances like the Matérn family, the constuction does not ensure that ${z_{i}}$ will follow a Matérn distribution. Similarly, the cross-covariances will not correspond to standard families of valid, cross-covariance functions [24]. Furthermore, the construction of Q described above is essentially finite-dimensional and the notion of conditional independence is restricted to the finite set of locations ${s_{1}},\dots ,{s_{n}}$. It is unclear if this construction can lead to a highly-multivariate spatial process over the entire domain $\mathcal{D}$ that respects process-level conditional independences. To see how this can be achieved, we elucidate the GGP [14] using a paradigm for graphical models fundamentally different from Markov random fields—covariance selection.

3 Graphical Gaussian Processes

Rather than building a Gaussian graphical model whose nodes are finite-dimensional random vectors, GGP builds a graphical model whose nodes are the component Gaussian processes of a multivariate spatial GP and edges represent process-level conditional dependence between the incident nodes given all other nodes. To do so, we first clarify what it means for two spatial process to be conditionally independent. Let the q nodes of $\mathcal{G}$ represent q different spatial processes, $\{{z_{i}}(s):s\in \mathcal{D}\}$ for $i=1,2,\dots ,q$. Two distinct processes ${z_{i}}(\cdot )$ and ${z_{j}}(\cdot )$ are conditionally independent given the remaining $q-2$ processes, which we denote by ${z_{i}}(\cdot )\perp {z_{j}}(\cdot )\hspace{0.1667em}|\hspace{0.1667em}{z_{-(ij)}}(\cdot )$, if the covariance between ${z_{i}}(s)$ and ${z_{j}}({s^{\prime }})$ is zero for any pair of locations $s,{s^{\prime }}\in \mathcal{D}$ conditional on full realizations of all of the remaining processes. A q-variate process $z(s)$ is a GGP with respect to an inter-variable graph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ if ${z_{i}}(\cdot )$ and ${z_{j}}(\cdot )$ are conditionally independent given the remaining processes whenever $(i,j)\notin \mathcal{E}$.

Any collection of q independent spatial processes is a GGP with respect to any graph $\mathcal{G}$ but is of limited use as it does not leverage inter-variable dependencies. Sparse graphical models are often used as parsimonious working models serving as a scalable approximation to a richer (or fuller) data generating model. Hence, a more relevant question is given a reasonable inter-variable graph $\mathcal{G}$, possibly inferred from scientific knowledge, how to construct a GGP that respects $\mathcal{G}$ and provides best approximation to a multivariate GP with a given cross-covariance function. Thus, we wish to construct a multivariate spatial process such that it conforms to the posited graphical relationships among the q dependent variables and best approximates the spatial structures of a given $q\times q$ cross-covariance matrix $C(s,{s^{\prime }})=({C_{ij}}(s,{s^{\prime }}))$. [17] proved that such an optimal GGP exists and is unique, and is one that preserves the marginal spatial covariance structure for each variable i as specified by ${C_{ii}}(s,{s^{\prime }})$ as well as cross-covariances ${C_{ij}}(s,{s^{\prime }})$ between any pairs of variables $(i,j)\in \mathcal{E}$.

In this regard, we recall a seminal paper by Dempster [16] on covariance selection. The key result can be described as follows. Let $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ be any undirected graph with q nodes and let $F=({f_{ij}})$ be any positive definite covariance matrix whose elements ${f_{ij}}$ give the covariance between random variables at nodes i and j. Then, the best approximation of F (in terms of the Kullback-Liebler distance) among the class of covariance matrices that satisfy the conditional independence relationships specified by the Gaussian graphical model $\mathcal{G}$ is a unique positive definite matrix $\tilde{F}=({\tilde{f}_{ij}})$ such that:

1. ${\tilde{f}_{ii}}={f_{ii}}$;
2. ${\tilde{f}_{ij}}={f_{ij}}$ for all $(i,j)\in \mathcal{E}$; and
3. ${({\tilde{F}^{-1}})_{ij}}=0$ for $(i,j)\notin \mathcal{E}$.

The first two conditions state that the optimal graphical model preserves all marginal variances and cross-covariances for edges included, while the third condition ensures adherence to the conditional independence relations specified by $\mathcal{G}$. Observe that covariance selection does not explicitly require Markovian assumptions, nor does it require modeling full conditionals such as in (2.1).

Covariance selection, as described above, is applicable to finite-dimensional inference, where, for example, we restrict attention to parameter estimation and random effects at a specified set of n spatial locations. Let $C=(C({s_{i}},{s_{j}}))$ be an $nq\times nq$ spatial covariance matrix, where each $C({s_{i}},{s_{j}})$ is a $q\times q$ submatrix evaluated using a valid cross-covariance function. Given an undirected graph that posits conditional independence relations among the q variables, we can compute the unique $\tilde{C}$ preserving marginal variances and covariances and conforming to the graph using an iterative proportional scaling (IPS) algorithm [45, 51].

Consider the graph in Figure 1 specifying the conditional independence structure among 4 variables. We consider each variable being observed at 10 locations uniformly sampled from a $(0,1)\times (0,1)$ grid. We assume a multivariate Maten covariance structure [2] between the processes and compute the $40\times 40$ cross-covariance matrix C. Now, Algorithm 1 lays out the steps to obtain the unique cross-covariance matrix $\tilde{C}$ corresponding to the GGP that conforms to the graph in Figure 1. The resulting precision matrix is plotted in Figure 2 which has zero entries only in the $(1,3)$ and $(2,4)$-th block, i.e., when there are no edges present between the variable nodes.

Figure 1

A cyclic (non-decomposable) 4-variable graph.

Algorithm 1

Covariance selection using Iterative Proportional Scaling (IPS).

Figure 2

$40\times 40$ precision matrix of the GGP after IPS algorithm. Zero-entries are plotted as circles and non-zero entries are plotted as dots.

However, this finite-dimensional application of covariance selection or IPS does not immediately extend to the process-level formulation of conditional dependence in a multivariate GP. We now illustrate how GGP of [17] extends covariance selection to infinite-dimensional framework that will allow predictive inference at arbitrary spatial locations over the entire domain of interest $\mathcal{D}$.

We begin the construction with a given inter-variable graph $\mathcal{G}$, a multivariate cross-covariance function $C(s,{s^{\prime }})$ and with a finite set of locations $\mathcal{L}$. Using covariance selection, we obtain a covariance matrix $\tilde{C}:=\tilde{C}(\mathcal{L})$ such that we can define a multivariate spatial model on $\mathcal{L}$ as

(3.1)

\[ \tilde{w}(\mathcal{L})\sim N(0,\tilde{C})\]

The properties of covariance selection ensure that ${\tilde{w}_{i}}(\mathcal{L})\sim N(\text{0},{C_{ii}}(\mathcal{L}))$, $Cov({\tilde{w}_{i}}(\mathcal{L}),{\tilde{w}_{j}}(\mathcal{L}))={C_{ij}}(\mathcal{L})$ for all $(i.j)\in \mathcal{E}$, and $Cov({\tilde{w}_{i}}(\mathcal{L}),{\tilde{w}_{j}}(\mathcal{L})|{w_{-ij}}(\mathcal{L})={({\tilde{C}^{-1}})_{ij}}=O$ for all $(i,j)\notin \mathcal{E}$. Thus on $\mathcal{L}$, each component of $\tilde{w}$ retains marginal covariances as specified by C, marginal cross-covariances for variable pairs included in the graph are also preserved, and conditional dependencies align with the graph.

To extend this finite-dimensional framework on $\mathcal{L}$ into a process-level graphical model for a multivariate GP on $\mathcal{D}$, we leverage the property that a univariate GP ${w_{i}}(s)$ with covariance function ${C_{ii}}(s)$ can be constructed as the sum of two orthogonal GPs ${w_{i}^{\ast }}(s)$ and ${r_{i}}(s)$. Here ${w_{i}^{\ast }}(s)$ is a finite rank predictive process that can be represented as ${w_{i}^{\ast }}(s)={b_{i}}{(s)^{\mathrm{T}}}{\tilde{w}_{i}}(\mathcal{L})$ where ${\tilde{w}_{i}}(\mathcal{L})$ is some random vector such that ${\tilde{w}_{i}}(\mathcal{L})\stackrel{d}{=}{w_{i}}(\mathcal{L})$ and ${b_{i}}(s)$ are such that the predictive process ${w_{i}^{\ast }}(\cdot )$ agrees with ${w_{i}}(\cdot )$ on $\mathcal{L}$ [3]. The second GP ${r_{i}}(\cdot )$ is often referred to as the residual GP as it explains the residual covariance between realizations of the GP ${w_{i}}(\cdot )$ beyond what can be explained by the predictive process. Choosing ${\tilde{w}_{i}}(\mathcal{L})$ to be the components of $\tilde{w}(\mathcal{L})$ from (3.1) and ${r_{i}}(\cdot )$’s to be independent across the variables i, one creates a multivariate GP $w(s)=B(s)\tilde{w}(\mathcal{L})+r(s)$ where $B(s)=\oplus {b_{i}}(s)$ and $r(s)={({r_{1}}(s),\dots ,{r_{q}}(s))^{\mathrm{T}}}$.

Covariance selection and predictive process harmonize to ensure that the resulting process $w(\cdot )$ exactly or approximately satisfies all the three criteria for being an optimal multivariate process given a covariance function C and a graphical model $\mathcal{G}$. The component processes satisfy the law ${w_{i}}(s)\sim GP(0,{C_{ii}})$ thereby preserving the marginals. Cross-covariances for $(i,j)\in \mathcal{E}$ are also preserved exactly on $\mathcal{L}$ and approximately elsewhere when choosing $\mathcal{L}$ to be sufficiently representative of the domain. Finally, the conditional independence of $\tilde{w}(\mathcal{L})$ with respect to $\mathcal{G}$ induced by covariance selection together with the component-wise independent residual processes ${r_{i}}(\cdot )$ ensure that the resulting multivariate process $w(\cdot )$ conforms to process-level conditional independences as specified by $\mathcal{G}$ and is thus a GGP. This construction is referred to as stitching – the multiple processes can be envisioned as multiple layers of fabric which are connected to each other only at the locations $\mathcal{L}$ via the edges of $\mathcal{G}$, serving as threads.

3.1 Inference from the Graphical Matérn GP

We consider a spatial dataset on q outcomes. Each variable is recorded at a set of locations ${\mathcal{D}_{i}}$ and is associated with a set of covariates ${X_{i}}$. Both the measurement-locations and covariates are allowed to be variable-specific. If a sparse graphical model $\mathcal{G}$ can be justified, empirically or scientifically, among the q variables, then a GGP model for analyzing such data will be specified at the process-level as

(3.2)

\[\begin{aligned}{}{y_{i}}(s)& \sim {X_{i}}{(s)^{\mathrm{T}}}{\beta _{i}}+{w_{i}}(s)+{\epsilon _{i}}(s)\hspace{2.5pt}\text{for}\hspace{2.5pt}i=1,\dots ,q,\\ {} {\epsilon _{i}}(s)& \stackrel{iid}{\sim }N(0,{\tau _{i}^{2}})\hspace{2.5pt}\text{for}\hspace{2.5pt}i=1,\dots ,q,s\in \mathcal{D}\\ {} w(\cdot )& ={({w_{1}}(\cdot ),\dots ,{w_{q}}(\cdot ))^{\mathrm{T}}}\sim GGP(0,C,\mathcal{G})\end{aligned}\]

Here C denotes the full q-variate cross-covariance function used to derive the covariance of the GGP. Without loss of generality, we use a reference set $\mathcal{L}$ that does not overlap with the data locations. The data likelihood corresponding to this hierarchical model can be formulated as

(3.3)

\[ \begin{aligned}{}{\prod \limits_{i=1}^{q}}& \prod \limits_{s\in {\mathcal{D}_{i}}}N({y_{i}}(s)\hspace{0.1667em}|\hspace{0.1667em}{X_{i}}{(s)^{\mathrm{T}}}{\beta _{i}},{\tau _{i}^{2}}I)\\ {} & \hspace{2em}\times {\prod \limits_{i=1}^{q}}N(w({\mathcal{S}_{i}})\hspace{0.1667em}|\hspace{0.1667em}{B_{i}}{w_{i}}(\mathcal{L}),{R_{ii}})\times N(w(\mathcal{L})|0,\tilde{C})\hspace{0.2778em}.\end{aligned}\]

Here ${B_{i}}$ stacks up the ${b_{i}}{(s)^{\mathrm{T}}}$’s for $s\in {\mathcal{D}_{i}}$ and ${R_{ii}}$ is the covariance matrix for the residual process ${r_{i}}(\cdot )$ over ${\mathcal{D}_{i}}$.

If C is chosen to be from the popular multivariate Matérn class, then the resulting cross-covariance for the GGP is denoted by the graphical Matérn and will retain the desirable properties of the multivariate Matérn family. In particular, each component process ${w_{i}}(\cdot )$ is exactly a Matérn GP with own set of parameters that allow interpretions about the spatial variance, smoothness and decay of the process. The cross-covariances for variable pairs included in $\mathcal{G}$ are also exactly or approximately multivariate Matérn, while unlike the multivariate Matérn the graphical Matérn allows process-level conditional independencies.

For a general graph, the data likelihood for GGP (3.3) involves manipulation of the $O(q)\times O(q)$ matrix $\tilde{C}$ (suppressing the dependence on the number of locations n, which, for this article, is assumed to be small or moderate). This matrix is derived from the full covariance matrix $C(\mathcal{L},\mathcal{L})$ using the IPS algorithm. However, in its full generality, the IPS will require $O({q^{2}})$ floating point operations or FLOPS [51]. Additionally, the parent covariance C needs to represent a valid cross-covariance class, which for the multivariate Matérn family, involves $O({q^{2}})$ parameters with constraints that require $O({q^{3}})$ FLOPS for assessment. Hence, for highly multivariate setting GGP with a general graph can remain computationally prohibitive both in terms of memory and time demands.

The computational advantages of GGP with a sparse graphical model is maximized when considering decomposable (or chordal) graphs. Such graphs are popular in Bayesian graphical models because of the computational conveniences and are often justifiably used to replace non-chordal graphs as any non-chordal graph can be embedded in a chordal one. Decomposable graphs can be represented in terms of a set of cliques K and a set of separators S such that the likelihood for the GGP on $\mathcal{L}$ can be decomposed as

(3.4)

\[ N(w(\mathcal{L})|0,\tilde{C})=\frac{{\textstyle\prod _{A\in K}}N({w_{A}}(\mathcal{L})|0,C)}{{\textstyle\prod _{A\in S}}N({w_{A}}(\mathcal{L})|0,C)}\]

where for any $A\subset \{1,\dots ,q\}$, ${w_{A}}(\mathcal{L})$ denotes the subset of $w(\mathcal{L})$ corresponding to the indices in A. Note that this obviates the necessity to explicitly obtain the $O(q)$-dimensional matrix $\tilde{C}$ via the IPS algorithm as the right hand side can be written in terms of the multivariate densities based on the parent covariance C. The maximum dimension of any matrix involved is $O({c^{3}})$ where c denotes the largest clique-size and there would be atmost $O(q)$ such matrices. Also the resulting likelihood only depends on the cross-covariances ${C_{ij}}$ for either $i=j$ or $(i,j)\in \mathcal{E}$. Thus the full parent multivariate Matérn GP with $O({q^{2}})$ parameters need not be specified, only $O(q+|\mathcal{E}|)$ parameters are required resulting in significant dimension reduction for sparse graphs. The GGP likelihood with Matérn covariance and a sparse graph $\mathcal{G}$ offers drastic reduction in both memory, time and parameter-dimensionality while retaining the benefits of spatial modeling using the Matérn family.

3.2 Implementation

We elucidate, with examples, the construction of a GGP given, in particular, a multivariate Matérn cross-covariance function and an undirected decomposable graph.

3.2.1 Setup

Here, we consider $q=10$ variables with each variable being observed at 250 locations uniformly chosen from the $(0,1)\times (0,1)$ grid. The latent spatial random process $w(s)$ in (3.2) is taken as the $10\times 1$ multivariate graphical Matérn GP [17] with respect to a decomposable variable graph $\mathcal{G}$ (3). The marginal scale, variance, and smoothness parameters for each component Matérn process, ${w_{i}}(\cdot )$, are denoted by ${\phi _{ii}}$, ${\sigma _{ii}}$ and ${\nu _{ii}}$, respectively, for each $i=1,2,\dots ,q$.

To ensure a valid multivariate Matérn cross-covariance function, a sufficient condition is to limit the intra-site covariance matrix $\Sigma =({\sigma _{ij}})$ to be [see Theorem 1 of 2]

(3.5)

\[ \begin{array}{c@{\hskip10.0pt}c}& {\sigma _{ij}}={b_{ij}}\frac{\Gamma (\frac{1}{2}({\nu _{ii}}+{\nu _{jj}}+d))\Gamma ({\nu _{ij}})}{{\phi _{ij}^{2{\Delta _{A}}+{\nu _{ii}}+{\nu _{jj}}}}\Gamma ({\nu _{ij}}+\frac{d}{2})}\hspace{0.2778em},\end{array}\]

where ${\Delta _{A}}\ge 0$ and $B=({b_{ij}})\gt 0$, i.e., is positive definite. This implies $\Sigma =(B\odot ({\gamma _{ij}}))$, where ${\gamma _{ij}}$’s are constants collecting the components in (3.5) involving just ${\nu _{ij}}$’s and ${\phi _{ij}}$’s, and ⊙ indicates the Hadamard (element-wise) product. For simplicity, we will assume ${\nu _{ii}}={\nu _{jj}}={\nu _{ij}}=0.5$, ${\phi _{ij}^{2}}=\frac{{\phi _{ii}^{2}}+{\phi _{jj}^{2}}}{2}$. The constraints in (3.5) simplifies to

(3.6)

\[ \begin{array}{c@{\hskip10.0pt}c}{\sigma _{ij}}=& {({\sigma _{ii}}{\sigma _{jj}})^{0.5}}\ast \frac{{\phi _{ii}^{0.5}}{\phi _{jj}^{0.5}}}{{\phi _{ij}^{0.5}}}{r_{ij}}\end{array}\]

where $R=({r_{ij}})$ is positive definite. Hence, as part of our simulation exercise, we only need to estimate the marginal scale (${\phi _{ii}}$) and variance parameters (${\sigma _{ii}}$) and cross-correlation parameters ${r_{ij}}$.

Observe that the graph in Figure 3 posits full conditional dependencies among the 10 spatially indexed variables. We now calculate the perfect ordering of the cliques of $\mathcal{G}$ to obtain the cliques and separators (Figure 4). Now using (3.4), the likelihood of the decomposable GGP can be written as

(3.7)

\[ \begin{aligned}{}& N(w(\mathcal{L})|0,\tilde{C})\\ {} & =\frac{N({w_{(1,2,3)}}(\mathcal{L})|0,C)N({w_{(2,3,4)}}(\mathcal{L})|0,C)}{N({w_{(6)}}(\mathcal{L})|0,C)N({w_{(8)}}(\mathcal{L})|0,C)}\times \\ {} & \frac{N({w_{(4,5,6)}}(\mathcal{L})|0,C)N({w_{(6,7,8)}}(\mathcal{L})|0,C)N({w_{(8,9,10)}}(\mathcal{L})|0,C)}{N({w_{(6)}}(\mathcal{L})|0,C)N({w_{(8)}}(\mathcal{L})|0,C)}\end{aligned}\]

Figure 3

Our example variable graph ($\mathcal{G}$).

Figure 4

Perfect ordering of cliques and separators for ($\mathcal{G}$).

3.2.2 Data Generation

We choose the 10 length vector of marginal variance (${\sigma _{ii}}$) and scale (${\phi _{ii}}$) parameters as two different permutations of the sequence $(1,1.444,1.889,2.333,2.777,3.222,6.667,4.111,4.556,5)$. The cross-correlation matrix R in (3.6) is generated as the standardized random matrix ${{R_{0}}{R_{0}}^{\mathrm{T}}}$, where ${R_{0}}$ has independent entries from $Uniform(-1,1)$ distribution.

The precision matrix of the GGP $w(\mathcal{L})$ is calculated as [see Lemma 5.5 of 31]

(3.8)

\[ \begin{aligned}{}\tilde{C}{(\mathcal{L},\mathcal{L})^{-1}}& ={\sum \limits_{m=1}^{p}}{[{{C_{[{K_{m}}]}}(\mathcal{L},\mathcal{L})^{-1}}]^{\mathcal{V}\times \mathcal{L}}}\\ {} & \hspace{2em}-{\sum \limits_{m=2}^{p}}{[{{C_{[{S_{m}}]}}(\mathcal{L},\mathcal{L})^{-1}}]^{\mathcal{V}\times \mathcal{L}}}\hspace{0.2778em},\end{aligned}\]

where, for any symmetric matrix $A=({a_{ij}})$ with rows and columns indexed by $\mathcal{U}\subset \mathcal{V}\times \mathcal{L}$, ${A^{\mathcal{V}\times \mathcal{L}}}$ is defined as a $|\mathcal{V}\times \mathcal{L}|\times |\mathcal{V}\times \mathcal{L}|$ matrix so that ${({A^{\mathcal{V}\times \mathcal{L}}})_{ij}}={a_{ij}}$ if $(i,j)\in \mathcal{U}$, and ${({A^{\mathcal{V}\times \mathcal{L}}})_{ij}}=0$ otherwise. Equations (3.7) and (3.8) show that inverting the full GGP cross-covariance matrix only requires inverting the clique and separator specific covariance matrices. Hence, the computational complexity for calculating the likelihood of a multivariate GGP boils down to $O({n^{3}}{c^{3}})$, for e.g. $O({250^{3}}\ast {3^{3}})$ in our example, where the maximum size of a clique in Figure 4 is 3. On the contrary, the likelihood of a full multivariate Matern GP would need $O({n^{3}}{q^{3}})$ complexity, i.e. $O({250^{3}}\ast 1000)$.

We use the cross-covariance in (3.8) to simulate the latent process $w(\cdot )$ as a $2500(250\ast 10)$-variate normal observation. Next, we use (3.2) to simulate our multivariate outcomes ${y_{i}}(.),i=1,\dots ,10$. In order to do that, we generate some random covariates ${X_{i}}(\cdot )$ from $N(0,2)$ distribution and simulate the error process ${\epsilon _{i}}(.)$ independently from $N(0,{\tau _{i}^{2}})$. We take ${\tau _{ii}^{2}}={\tau ^{2}}=0.25$ for all i.

For analyzing predictive efficiency of our method, we randomly pick $20\% $ of the locations for each outcome variable and consider them missing. We treat the full set of 250 locations as our reference set $\mathcal{L}$ and predict the test observations at every step of our sampler to compare with predictions later.

3.2.3 Data Analysis

The analysis of our simulated GGP data can be broken down in the following steps

1. Marginal parameter estimation: We estimate the marginal scale (${\phi _{ii}}$), variance (${\sigma _{ii}^{2}}$) and smoothness parameters (${\nu _{ii}}$) from the component Gaussian processes. We also estimate the error variance (${\tau _{i}^{2}}$) for each marginal processes. (Details in Algorithm 2)
2. Gibbs sampler initialization: For this step, we process the variable graph to calculate cliques and separators. Moreover, we color the nodes of the variable graph. This allows us to simulate the latent processes belonging to the same color in parallel in the Gibbs sampler.

We also construct a new edge-graph ${\mathcal{G}_{E}}(\mathcal{G})=({E_{\mathcal{V}}},{E^{\ast }})$ on the set of edges ${E_{\mathcal{V}}}$, i.e., there is an edge $((i,j),({i^{\prime }},{j^{\prime }}))$ in this new graph ${\mathcal{G}_{E}}(\mathcal{G})$ if $\{i,{i^{\prime }},j,{j^{\prime }}\}$ are in some clique K of $\mathcal{G}$. This edge-graph enables us to facilitate parallel updates of cross-correlation parameters corresponding to edges of the same color. These are specific examples of chromatic Gibbs samplers and significantly improve the speed of our sampling on parallel architectures. We also initialize our cross-correlation parameters at this step to start off the Gibbs sampler. The details are laid out in Algorithm 2.
3. Gibbs sampler: We run our Gibbs sampler to sample latent spatial processes, predict test observations and sample latent correlations. The detailed steps are laid out in Algorithm 3.

Sampling cross-correlation parameters requires only checking positive-definiteness of the clique-specific cross-correlation parameter matrix ($3\times 3$ here at maximum). For likelihood calculation, the largest matrix inversion across all these updates is of the order $750\times 750$, corresponding to the largest clique. The largest matrix that needs storing is also of dimension $750\times 750$. These result in appreciable reduction of computations from any multivariate Matérn model that involves $2500\times 2500$ matrices and positive-definiteness checks for $10\times 10$ matrices at every iteration.

Figure 5

Coloring of variable graph $\mathcal{G}$.

Figure 6

Coloring of the edge graph ${\mathcal{G}_{E}}(\mathcal{G})$.

Algorithm 2

Marginal parameter estimation and Gibbs sampler initialization.

Algorithm 3

Gibbs sampler algorithm.

We run the sampler in Algorithm 3 for 1000 iterations and obtain the cross-correlation parameter estimates and test set predictions after a burn-in of 150 samples. Figure 7 shows that we have accurately estimated the marginal micro-ergodic parameters (${\sigma _{ii}}\times {\phi _{ii}}$). In Figure 8, we plot the edge-specific estimates of cross-correlation parameter ${r_{ij}}$. Here, we observe that GGP accurately estimates the cross-correlation parameters for the edges in the graph $\mathcal{G}$ ($95\% $ credible interval of all but one estimate contain true values). We also create a grid of plots across 10 variables comparing the true test data values and predicted values from our algorithm. This is presented in Figure 9. The points fairly align on $y=x$ line and mean-square error varied from 0.403 to 1.064. Figures 10 and 11 depict two instances of the convergence of cross-correlation parameter chains.

Figure 7

Marginal scale-variance product estimates, red line denotes $y=x$ line.

Figure 8

Estimates of the cross-correlation parameters ${r_{ij}}$ with errorbars denoting $95\% $ credible interval, red line denotes $y=x$ line.

Figure 9

Prediction of values in the test set with 95% credible interval, red line denotes $y=x$ line.

Figure 10

MCMC simulations of the cross-correlation parameter corresponding to $(5,6)$ edge.

Figure 11

MCMC simulations of the cross-correlation parameter corresponding to $(9,10)$ edge.

4 Environmental Data Example

We demonstrate the practical use of GGP for modeling non-stationary spatial time-series data using daily PM2.5 measurements from monitoring stations in 11 northeastern states of the contiguous US, including Washington DC, for February 2020. The data are publicly available from the United States Environmental Protection Agency (EPA) website. Our current analysis focuses on $n=99$ monitoring stations that recorded at least 20 days of data in both 2019 and 2020. From the National Center for Environmental Prediction’s (NCEP) North American Regional Reanalysis (NARR) database, we obtained daily values of meteorological factors (temperature, barometric pressure, wind-speed, and relative humidity) that are posited to influence PM2.5 levels. We used multilevel B-spline smoothing using the MBA package in R to impute daily values of these variables and merge them with the EPA data. To account for baseline levels and weekly periodicity, we included a 7-day moving average of the PM2.5 data for each station and day in 2020, centered around the same day of 2019, and subtracted the day-of-the-week specific means from the raw PM2.5 values. We then analyzed the resulting highly multivariate (28-dimensional) spatial data set over $n=99$ locations.

Figure 12

AR (2) graph for 28 days in Februray as nodes.

Figure 13

Estimates of time-specific lag 1 cross-correlations.

Figure 14

Estimates of time-specific lag 2 cross-correlations.

After conducting exploratory analysis that revealed autocorrelations among pollutant processes on consecutive days (for both lag 1 and 2), even after adjusting for covariates, we employed a graphical Matérn GP with an $AR(2)$ graph (Figure 12). The marginal parameters for day t are ${\sigma _{tt}}$, ${\phi _{tt}}$, and ${\tau _{t}^{2}}$, the autoregressive cross-covariances are denoted by ${b_{t-1,t}}$ and ${b_{t-2,t}}$ between day t and days $t-1$ and $t-2$, respectively. GGP enables us to model non-separability in auto-covariances across both space and time, as well as time-varying marginal spatial parameters and autoregressive coefficients.

The GGP requires only 53 cross-covariance parameters (27 parameters for lag 1 and 26 parameters for lag 2). As the largest clique size in an AR(2) graph is 3, the largest matrix one must deal with for the data at 99 stations is only $297\times 297$. We present the estimates and credible intervals for lag 1 and lag 2 auto-correlation parameters ${r_{t-1,t}}$ and ${r_{t-2,t}}$ (normalized ${b_{t-1,t}}$ and ${b_{t-2,t}}$) obtained from GGP in Figures 13 and 14, respectively. These estimates display substantial variation across time, with the many spikes indicating high positive autocorrelation. Specifically, $95\% $ Bayesian credible intervals for 6 out of the 27 ($22\% $) of the lag 1 estimates and 4 out of the 26 ($15\% $) of the lag 2 estimates exclude 0, providing strong evidence in support of non-stationary autocorrelation across time. The presence of significant lag 2 autocorrelations justifies our choice of $AR(2)$ graph. The Graphical Matérn GP also yields impressive predictive performance ($93\% $ coverage) on hold-out data (Figure 15).

Figure 15

Prediction performance on the test dataset for the analyses.

5 Discussion

This article has developed and elucidated, with examples, the construction of multivariate Gaussian processes with associations among variables modeled by a valid spatial cross-covariance function while conditional independence between variables conforming to a posited undirected graph. The sparsity of the posited graph accrues substantial computational gains and makes the proposed approach especially attractive for datasets with very large numbers of spatially dependent outcomes. The algorithms implemented here have been especially designed to exploit the structure of decomposable graphs. Our examples demonstrate the algorithmic efficiency of chromatic Gibbs samplers used to update the latent process and the cross-covariance parameters and also the inferential efficiency, in terms of estimation and prediction, of the graphical Gaussian process model. Future avenues for research will include incorporating scalable spatial processes, such as the Nearest-Neighbor Gaussian Process [15, 13] for spatial and spatial-temporal processes and extensive comparisons with competing methods for multivariate outcomes, such as factor models, from computational and inferential standpoints.

Acknowledgements

Datta was supported by research grants from the U.S. National Science Foundation and the National Institute of Health. Banerjee was supported by grants from the National Science Foundation and National Institutes of Health. This work was supported by the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and the Intramural Research Program of the National Institute of Mental Health. The authors are grateful to the editor, associate editor and reviewers for their feedback, which has helped to improve the manuscript.

References

[1]

Apanasovich, T. V. and Genton, M. G. (2010). Cross-covariance functions for multivariate random fields based on latent dimensions. Biometrika 97(1) 15–30. https://doi.org/10.1093/biomet/asp078. MR2594414

[2]

Apanasovich, T. V., Genton, M. G. and Sun, Y. (2012). A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components. Journal of the American Statistical Association 107(497) 180–193. https://doi.org/10.1080/01621459.2011.643197. MR2949350

[3]

Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian Predictive Process Models for Large Spatial Datasets. Journal of the Royal Statistical Society, Series B 70 825–848. https://doi.org/10.1111/j.1467-9868.2008.00663.x. MR2523906

[4]

Banerjee, S. (2017). High-Dimensional Bayesian Geostatistics. Bayesian Analysis 12 583–614. https://doi.org/10.1214/17-BA1056R. MR3654826

[5]

Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2014) Hierarchical modeling and analysis for spatial data. CRC Press, Boca Raton, FL. MR3362184

[6]

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological) 36(2) 192–225. MR0373208

[7]

Besag, J., York, J. and Mollié, A. (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the institute of statistical mathematics 43(1) 1–20. https://doi.org/10.1007/BF00116466. MR1105822

[8]

Carey, V., Long, L. and Gentleman, R. (2020). RBGL: An interface to the BOOST graph library. R package version 1.66.0. http://www.bioconductor.org.

[9]

Chilés, J. and Delfiner, P. (1999) Geostatistics: Modeling Spatial Uncertainty. John Wiley: New York. https://doi.org/10.1002/9780470316993. MR1679557

[10]

Cressie, N. (1993) Statistics for Spatial Data, Revised ed. Wiley-Interscience. https://doi.org/10.1002/9781119115151. MR1239641

[11]

Cressie, N. and Wikle, C. K. (2015) Statistics for spatio-temporal data. John Wiley & Sons, Hoboken, NJ. MR2848400

[12]

Cressie, N. and Zammit-Mangion, A. (2016). Multivariate spatial covariance models: a conditional approach. Biometrika 103(4) 915–935. https://academic.oup.com/biomet/article-pdf/103/4/915/8339199/asw045.pdf. https://doi.org/10.1093/biomet/asw045. MR3620448

[13]

Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S. and Schaap, M. (2016). Non-Separable Dynamic Nearest-Neighbor Gaussian Process Models For Large Spatio-Temporal Data With An Application To Particulate Matter Analysis. Annals of Applied Statistics 10 1286–1316. https://doi.org/10.1214/16-AOAS931. MR3553225

[14]

Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. (2016). Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association 111 800–812. https://doi.org/10.1080/01621459.2015.1044091. MR3538706

[15]

Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. (2016). Hierarchical Nearest-Neighbor Gaussian Process Models For Large Geostatistical Datasets. Journal of the American Statistical Association 111(514) 800–812. https://doi.org/10.1080/01621459.2015.1044091. MR3538706

[16]

Dempster, A. P. (1972). Covariance selection. Biometrics 157–175. MR3931974

[17]

Dey, D., Datta, A. and Banerjee, S. (2021). Graphical Gaussian Process Models for Highly Multivariate Spatial Data. Biometrika. asab061. https://doi.org/10.1093/biomet/asab061. https://academic.oup.com/biomet/advance-article-pdf/doi/10.1093/biomet/asab061/41512896/asab061.pdf.

[18]

Finley, A. O., Datta, A., Cook, B. C., Morton, D. C., Andersen, H. E. and Banerjee, S. (2019). Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics 28(2) 401–414. https://doi.org/10.1080/10618600.2018.1537924. MR3974889

[19]

Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3) 432–441.

[20]

Gamerman, D. and Moreira, A. R. (2004). Multivariate spatial regression models. Journal of multivariate analysis 91(2) 262–281. https://doi.org/10.1016/j.jmva.2004.02.016. MR2087846

[21]

Gelfand, A. E. and Banerjee, S. (2010). Multivariate Spatial Process Models. In Handbook of Spatial Statistics (A. E. Gelfand, P. J. Diggle, M. Fuentes and P. Guttorp, eds.) 495–516 Boca Raton, FL: CRC Press. https://doi.org/10.1201/9781420072884-c28. MR2730963

[22]

Gelfand, A. E. and Banerjee, S. (2010). Multivariate Spatial Process Models. In Handbook of Spatial Statistics (A. E. Gelfand, P. J. Diggle, M. Fuentes and P. Guttorp, eds.) 217–244 Boca Raton, FL: CRC Press. https://doi.org/10.1201/9781420072884-c28. MR2730963

[23]

Gelfand, A. E. and Vounatsou, P. (2003). Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4(1) 11–15.

[24]

Genton, M. G. and Kleiber, W. (2015). Cross-covariance functions for multivariate geostatistics. Statistical Science 147–163. https://doi.org/10.1214/14-STS487. MR3353096

[25]

Gneiting, T., Kleiber, W. and Schlather, M. (2010). Matérn cross-covariance functions for multivariate random fields. Journal of the American Statistical Association 105(491) 1167–1177. https://doi.org/10.1198/jasa.2010.tm09420. MR2752612

[26]

Goulard, M. and Voltz, M. (1992). Linear coregionalization model: tools for estimation and choice of cross-variogram matrix. Mathematical Geology 24(3) 269–286.

[27]

Jin, X., Banerjee, S. and Carlin, B. P. (2007). Order-free co-regionalized areal data models with application to multiple-disease mapping. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(5) 817–838. https://doi.org/10.1111/j.1467-9868.2007.00612.x. MR2368572

[28]

Jin, X., Carlin, B. P. and Banerjee, S. (2005). Generalized hierarchical multivariate CAR models for areal data. Biometrics 61(4) 950–961. https://doi.org/10.1111/j.1541-0420.2005.00359.x. MR2216188

[29]

Katzfuss, M. and Guinness, J. (2021). A General Framework for Vecchia Approximations of Gaussian Processes. Statistical Science 36(1) 124–141. https://doi.org/10.1214/19-STS755. https://doi.org/10.1214/19-STS755. MR4194207

[30]

Krock, M., Kleiber, W., Hammerling, D. and Becker, S. (2021). Modeling massive multivariate spatial data with the basis graphical lasso. arXiv e-prints 2101.

[31]

Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford, United Kingdom. MR1419991

[32]

Le, N., Sun, L. and Zidek, J. V. (2001). Spatial prediction and temporal backcasting for environmental fields having monotone data patterns. Canadian Journal of Statistics 29(4) 529–554. https://doi.org/10.2307/3316006. MR1888503

[33]

Le, N. D. and Zidek, J. V. (2006) Statistical analysis of environmental space-time processes. Springer Science & Business Media. MR2223933

[34]

Le, N. D., Sun, W. and Zidek, J. V. (1997). Bayesian multivariate spatial interpolation with data missing by design. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59(2) 501–510. https://doi.org/10.1111/1467-9868.00081. MR1440593

[35]

Lopes, H. F., Salazar, E. and Gamerman, D. (2008). Spatial Dynamic Factor Analysis. Bayesian Analysis 3(4) 759–792. https://doi.org/10.1214/08-BA329. MR2469799

[36]

Majumdar, A. and Gelfand, A. E. (2007). Multivariate spatial modeling for geostatistical data using convolved covariance functions. Mathematical Geology 39(2) 225–245. https://doi.org/10.1007/s11004-006-9072-6. MR2324633

[37]

Mardia, K. (1988). Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. Journal of Multivariate Analysis 24(2) 265–284. https://doi.org/10.1016/0047-259X(88)90040-1. MR0926357

[38]

Musgrove, D. (2016). Spatial Models for Large Spatial and Spatiotemporal Data.

[39]

Peruzzi, M., Banerjee, S. and Finley, A. O. (2022). Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association 117(538) 969–982. https://doi.org/10.1080/01621459.2020.1833889. MR4436326

[40]

Ren, Q. and Banerjee, S. (2013). Hierarchical factor models for large spatially misaligned datasets: A low-rank predictive process approach. Biometrics 69 19–30. https://doi.org/10.1111/j.1541-0420.2012.01832.x. MR3058048

[41]

Rua, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications. Monographs on statistics and applied probability. Chapman and Hall/CRC Press, Boca Raton, FL. http://opac.inria.fr/record=b1119989. https://doi.org/10.1201/9780203492024. MR2130347

[42]

Saha, A. and Datta, A. (2018). BRISC: bootstrap for rapid inference on spatial covariances. Stat 7(1) 184. https://doi.org/10.1002/sta4.184. MR3805084

[43]

Salvaña, M. L. O. and Genton, M. G. (2020). Nonstationary cross-covariance functions for multivariate spatio-temporal random fields. Spatial Statistics 100411. https://doi.org/10.1016/j.spasta.2020.100411. MR4109598

[44]

Schmidt, A. M. and Gelfand, A. E. (2003). A Bayesian coregionalization approach for multivariate pollutant data. Journal of Geophysical Research: Atmospheres 108(D24).

[45]

Speed, T. P., Kiiveri, H. T. et al. (1986). Gaussian Markov distributions over finite graphs. The Annals of Statistics 14(1) 138–150. https://doi.org/10.1214/aos/1176349846. MR0829559

[46]

Sun, W., Le, N. D., Zidek, J. V. and Burnett, R. (1998). Assessment of a Bayesian multivariate interpolation approach for health impact studies. Environmetrics: The official journal of the International Environmetrics Society 9(5) 565–586.

[47]

Taylor-Rodriguez, D., Finley, A. O., Datta, A., Babcock, C., Andersen, H. E., Cook, B. D., Morton, D. C. and Banerjee, S. (2019). Spatial factor models for high-dimensional and large spatial data: An application in forest variable mapping. Statistica Sinica 29(3) 1155–1180. MR3932513

[48]

Ver Hoef, J. M. and Barry, R. P. (1998). Constructing and fitting models for cokriging and multivariable spatial prediction. Journal of Statistical Planning and Inference 69(2) 275–294. https://doi.org/10.1016/S0378-3758(97)00162-6. MR1631328

[49]

Ver Hoef, J. M., Cressie, N. and Barry, R. P. (2004). Flexible spatial models for kriging and cokriging using moving averages and the Fast Fourier Transform (FFT). Journal of Computational and Graphical Statistics 13(2) 265–282. https://doi.org/10.1198/1061860043498. MR2063985

[50]

Wackernagel, H. (2003) Multivariate Geostatistics, 3 ed. Springer-Verlag, Berlin.

[51]

Xu, P. q. F., Guo, J. and He, X. (2011). An improved iterative proportional scaling procedure for Gaussian graphical models. Journal of Computational and Graphical Statistics 20(2) 417–431. https://doi.org/10.1198/jcgs.2010.09044. MR2847802

[52]

Zapata, J., Oh, S. Y. and Petersen, A. (2021). Partial separability and functional graphical models for multivariate Gaussian processes. Biometrika. asab046. https://academic.oup.com/biomet/advance-article-pdf/doi/10.1093/biomet/asab046/41856973/asab046.pdf. https://doi.org/10.1093/biomet/asab046. MR4472841

[53]

Zhang, L. and Banerjee, S. Spatial factor modeling: A Bayesian matrix-normal approach for misaligned data. Biometrics 78(2) 560–573. https://doi.org/10.1111/biom.13452. https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13452.

[54]

Zhu, H., Strawn, N. and Dunson, D. B. (2016). Bayesian Graphical Models for Multivariate Functional Data. Journal of Machine Learning Research 17(204) 1–27. MR3580357