Sequential change detection is a classical problem with a variety of applications. However, the majority of prior work has been parametric, for example, focusing on exponential families. We develop a fundamentally new and general framework for sequential change detection when the pre- and post-change distributions are nonparametrically specified (and thus composite). Our procedures come with clean, nonasymptotic bounds on the average run length (frequency of false alarms). In certain nonparametric cases (like sub-Gaussian or sub-exponential), we also provide near-optimal bounds on the detection delay following a changepoint. The primary technical tool that we introduce is called an e-detector, which is composed of sums of e-processes—a fundamental generalization of nonnegative supermartingales—that are started at consecutive times. We first introduce simple Shiryaev-Roberts and CUSUM-style e-detectors, and then show how to design their mixtures in order to achieve both statistical and computational efficiency. Our e-detector framework can be instantiated to recover classical likelihood-based procedures for parametric problems, as well as yielding the first change detection method for many nonparametric problems. As a running example, we tackle the problem of detecting changes in the mean of a bounded random variable without i.i.d. assumptions, with an application to tracking the performance of a basketball team over multiple seasons.
In this paper, we present the U.S. Mental Health Dashboard, an R Shiny web application that facilitates exploratory data analysis of U.S. mental health data collected through national surveys. Mental health affects almost every aspect of people’s lives including their social relationships, substance use, academic success, professional productivity, and physical wellness. Even so, mental illnesses are often perceived as less legitimate or serious than physical diseases, and as a result of this stigmatization, many people suffer in silence without access to proper treatment. To address the lack of accessible healthcare information related to mental illness, the U.S. Mental Health Dashboard presents dynamic visualizations, tables, and choropleth maps of the prevalence and geographic distribution of key mental health metrics based on data from the National Survey on Drug Use and Health (NSDUH) and Behavioral Risk Factor Surveillance System (BRFSS). National and state-level estimates are provided for the civilian, non-institutionalized adult population of the United States as well as within relevant demographic subpopulations. By demonstrating the pervasiveness of mental illness and stark health inequities between demographic groups, this application aims to raise mental health awareness and reduce self-blame and stigmatization, especially for individuals that may inherently be at high risk. The U.S. Mental Health Dashboard has a wide variety of potential use cases: to illustrate to individuals suffering from mental illness and those in close proximity to them that they are not alone, identify subpopulations with the biggest need for mental health care, and help epidemiologists planning studies identify the target population for specific mental illness symptoms.
Variable selection in large-dimensional data has been extensively studied in different settings over the past decades. In a recent article, Shokoohi et. al. [29, DOI:10.1214/18-AOAS1198] proposed a method for variable selection in finite mixture of accelerated failure time regression models for studies on time-to-event data to capture heterogeneity within the population and account for censoring. In this paper, we introduce the fmrs package, which implements the variable selection methodology for such models. Furthermore, as a byproduct, the fmrs package facilitates variable selection in finite mixture regression models. The package also incorporates a tuning parameter selection mechanism based on component-wise bic. Commonly used penalties, such as Least Absolute Shrinkage and Selection Operator, and Smoothly Clipped Absolute Deviation, are integrated into fmrs. Additionally, the package offers an option for non-mixture regression models. The C language is chosen to boost the optimization speed. We provide an overview of the fmrs principles and the strategies employed for optimization. Hands-on illustrations are presented to help users get acquainted with fmrs. Finally, we apply fmrs to a lung cancer dataset and observe that a two-component mixture model reveals a subgroup with a more aggressive form of the disease, displaying a lower survival time.
Basket trials have captured much attention in oncology research in recent years, as advances in health technology have opened up the possibility of classification of patients at the genomic level. Bayesian methods are particularly prevalent in basket trials as the hierarchical structure is adapted to basket trials to allow for information borrowing. In this article, we extend the Bayesian methods to basket trials with treatment and control arms for continuous endpoints, which are often the cases in clinical trials for rare diseases. To account for the imbalance in the covariates which are potentially strong predictors but not stratified in a randomized trial, our models make adjustments for these covariates, and allow different coefficients across baskets. In addition, comparisons are drawn between two-stage design and one-stage design for the four Bayesian methods. Extensive simulation studies are conducted to examine the empirical performance of all models under consideration. A real data analysis is carried out to further demonstrate the usefulness of the Bayesian methods.
In diagnostic imaging drug developments, the imaging scan read data in controlled imaging drug clinical trials includes test positive and test negative. Broadly speaking, the standard of reference data are either presence or absence of a disease or clinical condition. Together, these data are used to assess the diagnostic performance of an investigational imaging drug in a controlled imaging drug clinical trial. For those imaging scan read data that cannot be called positive/negative, the “indeterminate” category is commonly used to cover imaging results that may be considered intermediate, indeterminate, or uninterpretable. Similarly, for those standard of reference data that cannot be categorized into presence/absence including uncollected or unavailable reference standard data, the “indeterminate” category may be used. Historically, little attention has been paid to the indeterminate imaging scan read data as they are generally rare or considered irrelevant though they are related to scanned subjects and can be informative. Subjects lack the standard of reference are simply excluded as such the study only reports the analysis results in subjects with available standard of reference data, known as completer analysis, similar to evaluable subjects seen in controlled trials for drug developments.
To improve diagnostic clinical trial planning, this paper introduces five attributes of an estimand in diagnostic imaging drug clinical trials. The paper then defines the indeterminate data mechanisms and gives examples for each indeterminate mechanism that is specific to the clinical context of a diagnostic imaging drug clinical trial. Several imputation approaches to handling indeterminate data are discussed. Depending on the clinical question of primary interests, indeterminate data may be intercurrent events. The paper ends with discussions on imputations of intercurrent events occurring in indeterminate imaging scan read data and those occurring in indeterminate standard of reference data when encountered in diagnostic imaging clinical trials and provides points to consider of estimands for diagnostic imaging drug developments.
The continuation-ratio (CR) model is frequently used in dose response studies to model a three-category outcome as the dose levels vary. Design issues for a CR model defined on an unrestricted dose interval have been discussed for estimating model parameters or a selected function of the model parameters. This paper uses metaheuristics to address design issues for a CR model defined on any compact dose interval when there are one or more objectives in the study and some are more important than others. Specifically, we use an exemplary nature-inspired metaheuristic algorithm called particle swarm optimization (PSO) to find locally optimal designs for estimating a few interesting functions of the model parameters, such as the most effective dose ($MED$), the maximum tolerated dose ($MTD$) and for estimating all parameters in a CR model. We demonstrate that PSO can efficiently find locally multiple-objective optimal designs for a CR model on various dose intervals and a small simulation study shows it tends to outperform the popular deterministic cocktail algorithm (CA) and another competitive metaheuristic algorithm called differential evolutionary (DE). We also discuss hybrid algorithms and their flexible applications to design early Phase 2 trials or tackle biomedical problems, such as different strategies for handling the recent pandemic.
Traditionally, research in nutritional epidemiology has focused on specific foods/food groups or single nutrients in their relation with disease outcomes, including cancer. Dietary pattern analysis have been introduced to examine potential cumulative and interactive effects of individual dietary components of the overall diet, in which foods are consumed in combination. Dietary patterns can be identified by using evidence-based investigator-defined approaches or by using data-driven approaches, which rely on either response independent (also named “a posteriori” dietary patterns) or response dependent (also named “mixed-type” dietary patterns) multivariate statistical methods. Within the open methodological challenges related to study design, dietary assessment, identification of dietary patterns, confounding phenomena, and cancer risk assessment, the current paper provides an updated landscape review of novel methodological developments in the statistical analysis of a posteriori/mixed-type dietary patterns and cancer risk. The review starts from standard a posteriori dietary patterns from principal component, factor, and cluster analyses, including mixture models, and examines mixed-type dietary patterns from reduced rank regression, partial least squares, classification and regression tree analysis, and least absolute shrinkage and selection operator. Novel statistical approaches reviewed include Bayesian factor analysis with modeling of sparsity through shrinkage and sparse priors and frequentist focused principal component analysis. Most novelties relate to the reproducibility of dietary patterns across studies where potentialities of the Bayesian approach to factor and cluster analysis work at best.
Master protocol is a type of trial designs where multiple therapies and/or multiple disease populations can be investigated in the same trial. A shared control can be used for multiple therapies to gain operational efficiency and gain attraction to patients. To balance between controlling for false positive rate and having adequate power for detecting true signals, the impact of False Discovery Rate (FDR) is evaluated when multiple investigational drugs are studied in the master protocol. With the shared control group, the “random high” or “random low” in the control group can potentially impact all hypotheses testing that compare each of the test regimens and the control group in terms of probability of having at least one positive hypothesis outcome, or multiple positive outcomes. When regulatory agencies make the decision of approving or declining one or more regimens based on the master protocol design, this introduces a different type of error: simultaneous false-decision error. In this manuscript, we examine in detail the derivations and properties of the simultaneous false-decision error in the master protocol with shared control under the framework of FDR. The simultaneous false-decision error consists of two parts: simultaneous false-discovery rate (SFDR) and simultaneous false non-discovery rate (SFNR). Based on our analytical evaluation and simulations, the magnitude of SFDR and SFNR inflation is small. Therefore, the multiple error rate controls are generally adequate, further adjustment to a pre-specified level on SFDR or SFNR or reduce the alpha allocated to each individual treatment comparison to the shared control is deemed unnecessary.
Community detection in networks is the process by which unusually well-connected sub-networks are identified–a central component of many applied network analyses. The paradigm of modularity quality function optimization stipulates a partition of the network’s vertexes that maximizes the difference between the fraction of edges within communities and the corresponding expected fraction if edges were randomly allocated among all vertex pairs while conserving the degree distribution. The modularity quality function incorporates exclusively the network’s topology and has been extensively studied whereas the integration of constraints or external information on community composition has largely remained unexplored. We define a greedy, recursive-backtracking search procedure to identify the constitution of high-quality network communities that satisfy the global constraint that each community be comprised of at least one vertex among a set of so-called special vertexes and apply our methodology to identifying health care communities (HCCs) within a network of hospitals such that each HCC consists of at least one hospital wherein at least a minimum number of cardiac defibrillator surgeries were performed. This restriction permits meaningful comparisons in cardiac care among the resulting health care communities by standardizing the distribution of cardiac care across the hospital network.
Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review Bayesian sequential clinical trial designs based on posterior probabilities, posterior predictive probabilities, and decision-theoretic frameworks. A pertinent question is whether Bayesian sequential designs need to be adjusted for the planning of interim analyses. We answer this question from three perspectives: a frequentist-oriented perspective, a calibrated Bayesian perspective, and a subjective Bayesian perspective. We also provide new insights into the likelihood principle, which is commonly tied to statistical inference and decision making in sequential clinical trials. Some theoretical results are derived, and numerical studies are conducted to illustrate and assess these designs.