Meta-analysis is a powerful tool for assessing drug safety by combining treatment-related toxicological findings across multiple studies, as clinical trials are typically underpowered for detecting adverse drug effects. However, incomplete reporting of adverse events (AEs) in published clinical studies is frequently encountered, especially if the observed number of AEs is below a pre-specified study-dependent threshold. Ignoring the censored AE information, often found in lower frequency, can significantly bias the estimated incidence rate of AEs. Despite its importance, this prevalent issue in meta-analysis has received little statistical or analytic attention in the literature. To address this challenge, we propose a Bayesian approach to accommodating the censored and possibly rare AEs for meta-analysis of safety data. Through simulation studies, we demonstrate that the proposed method can improve accuracy in point and interval estimation of incidence probabilities, particularly in the presence of censored data. Overall, the proposed method provides a practical solution that can facilitate better-informed decisions regarding drug safety.
Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes. While spatial factor models and multivariate basis expansions occupy a prominent place in this domain, this article elucidates a recent approach, graphical Gaussian Processes, that exploits the notion of conditional independence among a very large number of spatial processes to build scalable graphical models for fully model-based Bayesian analysis of multivariate spatial data.
Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.
Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.
Joint species distribution modeling is attracting increasing attention in the literature these days, recognizing the fact that single species modeling fails to take into account expected dependence/interaction between species. This short paper offers discussion that attempts to illuminate five noteworthy technical issues associated with such modeling in the context of plant data. In this setting, the joint species distribution work in the literature considers several types of species data collection. For convenience of discussion, we focus on joint modeling of presence/absence data. For such data, the primary modeling strategy has been through introduction of latent multivariate normal random variables.
These issues address the following: (i) how the observed presence/absence data is linked to the latent normal variables as well as the resulting implications with regard to modeling the data sites as independent or spatially dependent, (ii) the incompatibility of point referenced and areal referenced presence/absence data in spatial modeling of species distribution, (iii) the effect of modeling species independently/marginally rather than jointly within site, with regard to assessing species distribution, (iv) the interpretation of species dependence under the use of latent multivariate normal specification, and (v) the interpretation of clustering of species associated with specific joint species distribution modeling specifications.
It is hoped that, by attempting to clarify these issues, ecological modelers and quantitative ecologists will be able to better appreciate some subtleties that are implicit in this growing collection of modeling ideas. In this regard, this paper can serve as a useful companion piece to the recent survey/comparison article by [33] in Methods in Ecology and Evolution.