The timing of longitudinal measurements may depend upon outcome or disease severity. In biomedical studies relying on clinical encounter data, patients often have dense, irregular collections of visit data when suffering a worse health condition. In parallel, the longitudinal measurements may be impacted by the period of irregular visiting. Ignoring the impact of the outcome-dependent visiting process when constructing a longitudinal disease progression model can produce biased results. We propose a Bayesian joint model linking a mixed-effects model for the longitudinal marker and Weibull proportional hazards model with a log frailty for the visiting process, adjusting both longitudinal marker and event processes with covariates. We examine different random effect structures and performance characterizing disease trajectory. Motivated by clinical data on cystic fibrosis lung disease, we estimate the longitudinal process for lung function decline. Individuals with lower lung function tend to have more frequent clinical visits than those with higher lung function. Simulation studies suggest that incorporating a time-dependent Gaussian process is more important for model fit than adding the survival model via joint modeling; the random intercepts model exhibits maximum bias, especially when there is an outcome-dependent visiting process.
Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.
Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.