Basket trials have captured much attention in oncology research in recent years, as advances in health technology have opened up the possibility of classification of patients at the genomic level. Bayesian methods are particularly prevalent in basket trials as the hierarchical structure is adapted to basket trials to allow for information borrowing. In this article, we extend the Bayesian methods to basket trials with treatment and control arms for continuous endpoints, which are often the cases in clinical trials for rare diseases. To account for the imbalance in the covariates which are potentially strong predictors but not stratified in a randomized trial, our models make adjustments for these covariates, and allow different coefficients across baskets. In addition, comparisons are drawn between two-stage design and one-stage design for the four Bayesian methods. Extensive simulation studies are conducted to examine the empirical performance of all models under consideration. A real data analysis is carried out to further demonstrate the usefulness of the Bayesian methods.
In diagnostic imaging drug developments, the imaging scan read data in controlled imaging drug clinical trials includes test positive and test negative. Broadly speaking, the standard of reference data are either presence or absence of a disease or clinical condition. Together, these data are used to assess the diagnostic performance of an investigational imaging drug in a controlled imaging drug clinical trial. For those imaging scan read data that cannot be called positive/negative, the “indeterminate” category is commonly used to cover imaging results that may be considered intermediate, indeterminate, or uninterpretable. Similarly, for those standard of reference data that cannot be categorized into presence/absence including uncollected or unavailable reference standard data, the “indeterminate” category may be used. Historically, little attention has been paid to the indeterminate imaging scan read data as they are generally rare or considered irrelevant though they are related to scanned subjects and can be informative. Subjects lack the standard of reference are simply excluded as such the study only reports the analysis results in subjects with available standard of reference data, known as completer analysis, similar to evaluable subjects seen in controlled trials for drug developments.
To improve diagnostic clinical trial planning, this paper introduces five attributes of an estimand in diagnostic imaging drug clinical trials. The paper then defines the indeterminate data mechanisms and gives examples for each indeterminate mechanism that is specific to the clinical context of a diagnostic imaging drug clinical trial. Several imputation approaches to handling indeterminate data are discussed. Depending on the clinical question of primary interests, indeterminate data may be intercurrent events. The paper ends with discussions on imputations of intercurrent events occurring in indeterminate imaging scan read data and those occurring in indeterminate standard of reference data when encountered in diagnostic imaging clinical trials and provides points to consider of estimands for diagnostic imaging drug developments.
Designing longitudinal studies is generally a very challenging problem because of the complex optimization problems. We show the popular nature-inspired metaheuristic algorithm, Particle Swarm Optimization (PSO), can find different types of optimal exact designs for longitudinal studies with different correlation structures for different types of models. In particular, we demonstrate PSO-generated D-optimal longitudinal studies for the widely used Michaelis-Menten model with various correlation structures agree with the reported analytically derived locally D-optimal designs in the literature when there are only 2 observations per subject, and their numerical D-optimal designs when there are 3 and 4 observations per subject. We further show the usefulness of PSO by applying it to generate new locally D-optimal designs to estimate model parameters when there are 5 or more observations per subject. Additionally, we find various optimal longitudinal designs for a growth curve model commonly used in animal studies and for a nonlinear HIV dynamic model for studying T-cells in AIDS subjects. In particular, c-optimal exact designs for estimating one or more functions of model parameters (c-optimality) were found, along with other types of multiple objectives optimal designs.
The continuation-ratio (CR) model is frequently used in dose response studies to model a three-category outcome as the dose levels vary. Design issues for a CR model defined on an unrestricted dose interval have been discussed for estimating model parameters or a selected function of the model parameters. This paper uses metaheuristics to address design issues for a CR model defined on any compact dose interval when there are one or more objectives in the study and some are more important than others. Specifically, we use an exemplary nature-inspired metaheuristic algorithm called particle swarm optimization (PSO) to find locally optimal designs for estimating a few interesting functions of the model parameters, such as the most effective dose ($MED$), the maximum tolerated dose ($MTD$) and for estimating all parameters in a CR model. We demonstrate that PSO can efficiently find locally multiple-objective optimal designs for a CR model on various dose intervals and a small simulation study shows it tends to outperform the popular deterministic cocktail algorithm (CA) and another competitive metaheuristic algorithm called differential evolutionary (DE). We also discuss hybrid algorithms and their flexible applications to design early Phase 2 trials or tackle biomedical problems, such as different strategies for handling the recent pandemic.
The crossover models and interference models are frequently used in clinical trials, agriculture studies, social studies, etc. While some theoretical optimality results are available, it is still challenging to apply these results in practice. The available theoretical results, due to the complexity of exact optimal designs, typically require some specific combinations of the number of treatments (t), periods (p), and subjects (n). A more flexible method is to build integer programming based on theories in approximate design theory, which can handle general cases of $(t,p,n)$. Nonetheless, those results are generally derived for specific models or design problems and new efforts are needed for new problems. These obstacles make the application of the theoretical results rather difficult. Here we propose a new algorithm, a revision of the optimal weight exchange algorithm by . It provides efficient crossover designs quickly under various situations, for different optimality criteria, different parameters of interest, different configurations of $(t,p,n)$, as well as arbitrary dropout scenarios. To facilitate the usage of our algorithm, the corresponding R package and an R Shiny app as a more user-friendly interface has been developed.
Subdata selection from big data is an active area of research that facilitates inferences based on big data with limited computational expense. For linear regression models, the optimal design-inspired Information-Based Optimal Subdata Selection (IBOSS) method is a computationally efficient method for selecting subdata that has excellent statistical properties. But the method can only be used if the subdata size, k, is at last twice the number of regression variables, p. In addition, even when $k\ge 2p$, under the assumption of effect sparsity, one can expect to obtain subdata with better statistical properties by trying to focus on active variables. Inspired by recent efforts to extend the IBOSS method to situations with a large number of variables p, we introduce a method called Combining Lasso And Subdata Selection (CLASS) that, as shown, improves on other proposed methods in terms of variable selection and building a predictive model based on subdata when the full data size n is very large and the number of variables p is large. In terms of computational expense, CLASS is more expensive than recent competitors for moderately large values of n, but the roles reverse under effect sparsity for extremely large values of n.
The supersaturated design is often used to discover important factors in an experiment with a large number of factors and a small number of runs. We propose a method for constructing supersaturated designs with small coherence. Such designs are useful for variable selection methods such as the Lasso. Examples are provided to illustrate the proposed method.
Traditionally, research in nutritional epidemiology has focused on specific foods/food groups or single nutrients in their relation with disease outcomes, including cancer. Dietary pattern analysis have been introduced to examine potential cumulative and interactive effects of individual dietary components of the overall diet, in which foods are consumed in combination. Dietary patterns can be identified by using evidence-based investigator-defined approaches or by using data-driven approaches, which rely on either response independent (also named “a posteriori” dietary patterns) or response dependent (also named “mixed-type” dietary patterns) multivariate statistical methods. Within the open methodological challenges related to study design, dietary assessment, identification of dietary patterns, confounding phenomena, and cancer risk assessment, the current paper provides an updated landscape review of novel methodological developments in the statistical analysis of a posteriori/mixed-type dietary patterns and cancer risk. The review starts from standard a posteriori dietary patterns from principal component, factor, and cluster analyses, including mixture models, and examines mixed-type dietary patterns from reduced rank regression, partial least squares, classification and regression tree analysis, and least absolute shrinkage and selection operator. Novel statistical approaches reviewed include Bayesian factor analysis with modeling of sparsity through shrinkage and sparse priors and frequentist focused principal component analysis. Most novelties relate to the reproducibility of dietary patterns across studies where potentialities of the Bayesian approach to factor and cluster analysis work at best.
In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using k-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users’ activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.
Master protocol is a type of trial designs where multiple therapies and/or multiple disease populations can be investigated in the same trial. A shared control can be used for multiple therapies to gain operational efficiency and gain attraction to patients. To balance between controlling for false positive rate and having adequate power for detecting true signals, the impact of False Discovery Rate (FDR) is evaluated when multiple investigational drugs are studied in the master protocol. With the shared control group, the “random high” or “random low” in the control group can potentially impact all hypotheses testing that compare each of the test regimens and the control group in terms of probability of having at least one positive hypothesis outcome, or multiple positive outcomes. When regulatory agencies make the decision of approving or declining one or more regimens based on the master protocol design, this introduces a different type of error: simultaneous false-decision error. In this manuscript, we examine in detail the derivations and properties of the simultaneous false-decision error in the master protocol with shared control under the framework of FDR. The simultaneous false-decision error consists of two parts: simultaneous false-discovery rate (SFDR) and simultaneous false non-discovery rate (SFNR). Based on our analytical evaluation and simulations, the magnitude of SFDR and SFNR inflation is small. Therefore, the multiple error rate controls are generally adequate, further adjustment to a pre-specified level on SFDR or SFNR or reduce the alpha allocated to each individual treatment comparison to the shared control is deemed unnecessary.