The supplementary material contains detailed summary results for each metric and dataset used in the study. It also contains a summary of data-generating models for each of the datasets.

Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [

Analysis of data in the presence of model uncertainty is a critical problem in statistical modeling applications. Accounting for model uncertainty, rather than selecting a single statistical model, improves predictive performance and robustness in estimation and inference of model parameters [

One common instance of model uncertainty is that of variable selection in linear regression model. Given an

The Bayesian framework provides a straightforward way to account for model uncertainty by treating the model as a parameter itself, using Bayesian model averaging (BMA) [

Several default parameter prior choices have been proposed in the last thirty years (see Porwal and Raftery [

We compare combinations of three default parameter priors with eight choices of model space priors that have been advocated in the literature. These model space priors correspond to different flavors of Bayesian inference with: (i) fixed hyper-parameter choices, (ii) with Bayesian treatment of hyper-parameters, and (iii) estimation of hyper-parameters in an empirical Bayes (EB) manner. The comparison is carried out over an extensive simulation study closely based on 14 real datasets that span a wide range of practical data analysis situations.

The article is organized as follows. Section

Bayesian model averaging [

Assuming that there is one true model among the set of

Under BMA inference, we can express the predictive distribution of a quantity of interest, Δ, such as a parameter or an observable future quantity, as a weighted average of its predictive distributions under the different candidate models:

BMA has several desirable theoretical properties [

The next subsection discusses the choice of parameter and model space priors that need to be specified by the user when implementing BMA.

Despite the wide adoption of Bayesian methods in linear models, prior elicitation for linear models is still an open problem. The parameter prior distribution

This is popular because of its computational efficiency in evaluating marginal likelihoods and performing model search. It is also attractive because of its intuitive interpretation arising from analysis of a conceptual sample generated using the same design matrix

Based on an extensive simulation study, Porwal and Raftery [

In terms of theoretical properties, all three priors are model-selection consistent [

Model space priors require specification of the prior probabilities of all models

In the absence of prior information, a common choice is to set

For a fixed value of

Any fixed choice of

However, maximization of (

EB optimisation algorithm for

An alternative way to reduce the sensitivity of the posterior distribution to prior assumptions is to use hierarchical modeling and specify a weak hyper-prior for

Under a uniform prior on

Under a Beta-Binomial (BB) prior, the prior expected model size is

Prior model size distribution for the Boston Housing and Nutrimouse datasets.

Alternatively, we can use an EB approach to learn

EB optimisation algorithm for

For Zellner’s

Castillo et al [

The complexity prior is defined as

To illustrate the effect of different model space priors, we use two datasets from our analysis: Boston Housing

The Bernoulli model space priors are very concentrated around their mean,

Summary of prior moments of model size

Model prior | E[S] | Var(S) | |

Bernoulli |
|||

Beta-Binomial |
|||

Complexity |
– | – |

We investigate the performance of different model space priors and parameter prior combinations using an extensive simulation study based closely on real datasets. We evaluate the effect of prior choices for the statistical tasks of parameter point and interval estimation, inference, point and interval prediction, and computation time.

All the parameter and model space prior combinations were implemented using the ^{3} Metropolis-Hastings algorithm for sampling from the posterior distribution of models [

For the EB methods, we used Algorithm

We based our analysis on 14 publicly available datasets, of which six are available from

Datasets used in the study.

Dataset Name | Sample size (N) | Covariates (p) | Source |

College | 777 | 14 | |

Bias Correction-Tmax | 7590 | 21 | UCI ML repository |

Bias Correction-Tmin | 7590 | 21 | UCI ML repository |

SML2010 | 1373 | 22 | UCI ML repository |

Bike sharing-daily | 731 | 28 | UCI ML repository |

Bike sharing-hourly | 17379 | 32 | UCI ML repository |

Superconductivity | 21263 | 81 | UCI ML repository |

Diabetes | 442 | 64 | |

Ozone | 330 | 44 | |

Boston housing | 506 | 103 | |

NIR | 166 | 225 | |

Nutrimouse | 40 | 120 | |

Multidrug | 60 | 853 | |

Liver toxicity | 64 | 3116 |

For each dataset, we selected a data generating model that closely approximates the real dataset. We carry out all-subsets regression for datasets with

We used the data generating model and parametric bootstrapping to generate 100 bootstrapped datasets with the same design matrix

We compared the performance of different parameter and model space prior combinations on these simulated datasets using the following metrics:

We also compared methods based on their out-of-sample predictive performance. We divided each dataset into 100 random 75%–25% train-test splits. We trained the methods on the training data and used the test data to assess the predictive performance using the metrics described below:

We also recorded the average size of the sampled models for each dataset and the average CPU time (in seconds) to carry out BMA for one bootstrapped dataset.

The results are shown in Table

For each metric, we color the methods based on their performance relative to the reference metric. A method is colored green if it performed similarly or better than the reference method, yellow if it performed somewhat worse, and orange if it performed substantially worse.

For all choices of parameter prior, Beta-Binomial

Most parameter and model prior combinations selected sparser models than the

We also note that the EB model space priors tended to outperform the corresponding SDM model space priors when combined with the Hyper-

Performance of different parameter prior and model space prior combinations for inference in linear regression under model uncertainty: “PointEst” is the RMSE for point estimation, “IntEst” is the Mean Interval Score (MIS) for interval estimation, “Inference” is the 1- area under the precision-recall curve (AUPRC), “Prediction” is the RMSE for point prediction, while “IntPred” is the MIS for interval prediction. “N vars” is the average number of variables used for the task. All metrics are standardized to equal 1 for the

We have compared BMA techniques with different choices of model space priors and parameter priors using an empirical study based closely on real datasets. We found that the Beta-Binomial

We are not the first to compare model space priors in the presence of model uncertainty. Past comparisons have either focused on a subset of the model priors discussed here, or evaluated BMA methods for only a subset of the statistical tasks considered here. In several cases, they also tended to use simulation designs that are at best loosely related to empirical data observed in practice.

Ley and Steel [

Scott and Berger [

We found the complexity priors [

We have focused attention on independent model priors, i.e. priors in which the inclusion of each variable is statistically independent of that of the other variables. However, non-independent default priors have been proposed as well. George [

We thank Abel Rodriguez for helpful discussions.