Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.
Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.