Estimating a prediction function is a fundamental component of many data analyses. The super learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms (screeners), including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a super learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screeners should be used to protect against poor performance of any one screener, similar to the guidance for choosing a library of prediction algorithms for the super learner. These results are further illustrated through the analysis of HIV-1 antibody data.
In oncology therapy development, Simon’s two-stage design is commonly employed in early-phase clinical trials to assess the preliminary efficacy of a single dose, typically the maximum tolerable dose (MTD) or the maximum assessed dose (MAD). In this design, a dose may be terminated at the first stage if the anti-tumor activity is insufficient or it may proceed to the second stage for further evaluation with more subjects. To enhance the design for better benefit-risk profile dose selection and to meet the increasing needs for study designs that explore dose-response relationships, we extend Simon’s two-stage design to evaluate two doses and to include early termination for success in addition to futility. The proposed method derives decision rules and sample sizes for optimal study designs that minimize the expected or overall sample sizes while controlling type I error and meeting desired power.
Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. A formal way to perform this selection under the Bayesian approach is to select the model with highest posterior probability. The problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model. We propose to carry out this optimization using simulated annealing and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient. Theoretical justifications are provided and applications to high dimensional datasets are discussed. The proposed method is implemented in an R package sahpm for general use and is made available on R CRAN.
The supersaturated design is often used to discover important factors in an experiment with a large number of factors and a small number of runs. We propose a method for constructing supersaturated designs with small coherence. Such designs are useful for variable selection methods such as the Lasso. Examples are provided to illustrate the proposed method.
Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.