In cancer research, leveraging patient-derived xenografts (PDXs) in pre-clinical experiments is a crucial approach for assessing innovative therapeutic strategies. Addressing the inherent variability in treatment response among and within individual PDX lines is essential. However, the current literature lacks a user-friendly statistical power analysis tool capable of concurrently determining the required number of PDX lines and animals per line per treatment group in this context. In this paper, we present a simulation-based R package for sample size determination, named ‘PDXpower’, which is publicly available at The Comprehensive R Archive Network (https://CRAN.R-project.org/package=PDXpower). The package is designed to estimate the necessary number of both PDX lines and animals per line per treatment group for the design of a PDX experiment, whether for an uncensored outcome, or a censored time-to-event outcome. Our sample size considerations rely on two widely used analytical frameworks: the mixed effects ANOVA model for uncensored outcomes and Cox’s frailty model for censored data outcomes, which effectively account for both inter-PDX variability and intra-PDX correlation in treatment response. Step-by-step illustrations for utilizing the developed package are provided, catering to scenarios with or without preliminary data.
Up-and-Down designs (UDDs) are ubiquitous for dose-finding in a wide variety of scientific, engineering, and clinical fields. They are defined by a few simple rules that generate a random walk around the target percentile. UDDs’ combination of robust, tractable behavior, straightforward usage, and good dose-finding performance, has won the trust of practitioners and their consulting analysts across fields and continents. In contrast, in recent decades the statistical dose-finding design field has turned a cold shoulder towards UDDs, and it is quite possible that many younger dose-finding methods researchers are not even aware of this design approach.
We present a concise overview of UDDs and their current state-of-the-art methodology, with references for further inquiry. We also revisit the performance comparison between UDDs and novel, more complicated design approaches such as the Continual Reassessment Method and the Bayesian Optimal Interval design, which we group under the term “Aim-for-Target” designs. UDDs fare very well in the comparison, particularly in terms of robustness to sources of variability.
In the last two decades, single-arm trials (SATs) have been effectively used to study anticancer therapies in well-defined patient populations using durable response rates as an objective and interpretable study endpoints. With a growing trend of regulatory accelerated approval (AA) requiring randomized controlled trials (RCTs), some confusions have arisen about the roles of SATs in AA. This review is intended to elucidate necessary and desirable conditions under which an SAT may be considered appropriate for AA. Specifically, the paper describes (1) two necessary conditions for designing an SAT, (2) eight desirable conditions that help either optimize the study design and doses or interpret the study results, and (3) three additional considerations for construction of estimands, adaptive designs, and timely communication with relevant regulatory agencies. Three examples are presented to demonstrate how SATs can or cannot provide sufficient evidence to support regulatory decision. Conditions and considerations presented in this review may serve as a set of references for sponsors considering SATs to support regulatory approval of anticancer drugs.
Supervised dimension reduction (SDR) has been a topic of growing interest in data science, as it enables the reduction of high-dimensional covariates while preserving the functional relation with certain response variables of interest. However, existing SDR methods are not suitable for analyzing datasets collected from case-control studies. In this setting, the goal is to learn and exploit the low-dimensional structure unique to or enriched by the case group, also known as the foreground group. While some unsupervised techniques such as the contrastive latent variable model and its variants have been developed for this purpose, they fail to preserve the functional relationship between the dimension-reduced covariates and the response variable. In this paper, we propose a supervised dimension reduction method called contrastive inverse regression (CIR) specifically designed for the contrastive setting. CIR introduces an optimization problem defined on the Stiefel manifold with a non-standard loss function. We prove the convergence of CIR to a local optimum using a gradient descent-based algorithm, and our numerical study empirically demonstrates the improved performance over competing methods for high-dimensional data.
Fair Machine Learning endeavors to prevent unfairness arising in the context of machine learning applications embedded in society. To this end, several mathematical fairness notions have been proposed. The most known and used notions turn out to be expressed in terms of statistical independence, which is taken to be a primitive and unambiguous notion. However, two choices remain (and are largely unexamined to date): what exactly is the meaning of statistical independence and what are the groups to which we ought to be fair? We answer both questions by leveraging Richard Von Mises’ theory of probability, which starts with data, and then builds the machinery of probability from the ground up. In particular, his theory places a relative definition of randomness as statistical independence at the center of statistical modelling. Much in contrast to the classically used, absolute i.i.d.-randomness, which turns out to be “orthogonal” to his conception. We show how Von Mises’ frequential modeling approach fits well to the problem of fair machine learning and show how his theory (suitably interpreted) demonstrates the equivalence between the contestability of the choice of groups in the fairness criterion and the contestability of the choice of relative randomness. We thus conclude that the problem of being fair in machine learning is precisely as hard as the problem of defining what is meant by being random. In both cases there is a consequential choice, yet there is no universal “right” choice possible.
Variable rate irrigation (VRI) seeks to increase the efficiency of irrigation by spatially adjusting water output within an agricultural field. Central to the success of VRI technology is establishing homogeneous irrigation zones. In this research, we propose a fusion of statistical modeling and deep learning by using artificial neural networks to map irrigation zones from simple-to-measure predictors. We further couple our neural network model with spatial correlation to capture smooth variations in the irrigation zones. We demonstrate the effectiveness of our model to define irrigation zones for a farm of winter wheat crop in Rexburg, Idaho.
Growth curve analysis (GCA) has a wide range of applications in various fields where growth trajectories need to be modeled. Heteroscedasticity is often present in the error term, which can not be handled with sufficient flexibility by standard linear fixed or mixed-effects models. One situation that has been addressed is where the error variance is characterized by a linear predictor with certain covariates. A frequently encountered scenario in GCA, however, is one in which the variance is a smooth function of the mean with known shape restrictions. A naive application of standard linear mixed-effects models would underestimate the variance of the fixed effects estimators and, consequently, the uncertainty of the estimated growth curve. We propose to model the variance of the response variable as a shape-restricted (increasing/decreasing; convex/concave) function of the marginal or conditional mean using shape-restricted splines. A simple iteratively reweighted fitting algorithm that takes advantage of existing software for linear mixed-effects models is developed. For inference, a parametric bootstrap procedure is recommended. Our simulation study shows that the proposed method gives satisfactory inference with moderate sample sizes. The utility of the method is demonstrated using two real-world applications.
Cure models are gaining more and more popularity for modeling time-to-event data for different forms of cancer, for which a considerable proportion of patients are considered “cured.” Two types of cure models are widely used, the mixture cure model (MCM) and the promotion time cure model (PTCM). In this article, we propose a unified estimand Δ for comparing treatment and control groups under the survival models with cure fraction, which focuses on whether the treatment extends survival for patients. In addition, we introduce a general framework of Bayesian inference under the cure models. Simulation studies demonstrate that regardless of whether the model is correctly specified, the inference of the unified estimand Δ yields desirable empirical performance. We analyze the ECOG’s melanoma cancer data E1684 via the unified estimand Δ under different models to further demonstrate the proposed methodology.
The purpose of this paper is to develop a practical framework for the analysis of the linear mixed-effects models for censored or missing data with serial correlation errors, using the multivariate Student’s t-distribution, being a flexible alternative to the use of the corresponding normal distribution. We propose an efficient ECM algorithm for computing the maximum likelihood estimates for these models with standard errors of the fixed effects and likelihood function as a by-product. This algorithm uses closed-form expressions at the E-step, which relies on formulas for the mean and variance of a truncated multivariate Student’s t-distribution. In order to illustrate the usefulness of the proposed new methodology, artificial and a real dataset are analyzed. The proposed algorithm and methods are implemented in the R package ARpLMEC.
The tumor microenvironment (TME) is a complex and dynamic ecosystem that involves interactions between different cell types, such as cancer cells, immune cells, and stromal cells. These interactions can promote or inhibit tumor growth and affect response to therapy. Multitype Gibbs point process (MGPP) models are statistical models used to study the spatial distribution and interaction of different types of objects, such as the distribution of cell types in a tissue sample. Such models are potentially useful for investigating the spatial relationships between different cell types in the tumor microenvironment, but so far studies of the TME using cell-resolution imaging have been largely limited to spatial descriptive statistics. However, MGPP models have many advantages over descriptive statistics, such as uncertainty quantification, incorporation of multiple covariates and the ability to make predictions. In this paper, we describe and apply a previously developed MGPP method, the saturated pairwise interaction Gibbs point process model, to a publicly available multiplexed imaging dataset obtained from colorectal cancer patients. Importantly, we show how these methods can be used as joint species distribution models (JSDMs) to precisely frame and answer many relevant questions related to the ecology of the tumor microenvironment.