Phase I trials investigate the toxicity profile of a new treatment and identify the maximum tolerated dose for further evaluation. Most phase I trials use a binary dose-limiting toxicity endpoint to summarize the toxicity profile of a dose. In reality, reported toxicity information is much more abundant, including various types and grades of adverse events. Building upon the i3+3 design (Liu et al., 2020), we propose the Ti3+3 design, in which the letter “T” represents “total” toxicity. The proposed design takes into account multiple toxicity types and grades by computing the toxicity burden at each dose. The Ti3+3 design aims to achieve desirable operating characteristics using a simple statistics framework that utilizes“toxicity burden interval” (TBI). Simulation results show that Ti3+3 demonstrates comparable performance with existing more complex designs.
Random forests are a powerful machine learning tool that capture complex relationships between independent variables and an outcome of interest. Trees built in a random forest are dependent on several hyperparameters, one of the more critical being the node size. The original algorithm of Breiman, controls for node size by limiting the size of the parent node, so that a node cannot be split if it has less than a specified number of observations. We propose that this hyperparameter should instead be defined as the minimum number of observations in each terminal node. The two existing random forest approaches are compared in the regression context based on estimated generalization error, bias-squared, and variance of resulting predictions in a number of simulated datasets. Additionally the two approaches are applied to type 2 diabetes data obtained from the National Health and Nutrition Examination Survey. We have developed a straightforward method for incorporating weights into the random forest analysis of survey data. Our results demonstrate that generalization error under the proposed approach is competitive to that attained from the original random forest approach when data have large random error variability. The R code created from this work is available and includes an illustration.
There are many cases in which one has continuous flows over networks, and there is interest in predicting and monitoring such flows. This paper provides Bayesian models for two types of networks—those in which flow can be bidirectional, and those in which flow is unidirectional. The former is illustrated by an application to electrical transmission over the power grid, and the latter is examined with data on volumetric water flow in a river system. Both applications yield good predictive accuracy over short time horizons. Predictive accuracy is important in these applications—it improves the efficiency of the energy market and enables flood warnings and water management.
In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental system that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general statistical-mechanistic Bayesian hierarchical model. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides quantifiable evidence for how reductions in human mobility altered early case dynamics in New York City.
The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match and the Poisson distribution, the time between goals throughout the course of a season and the exponential distribution, and the time location of goals during football games and the continuous uniform distribution. We found that the Poisson process and the three probability distributions accurately describe Premier League goal scoring. In addition, Poisson regression was utilized to predict outcomes for a Premier League season, using different sets of season data and with a large number of simulations being involved. We examined and compared various soccer metrics from our simulation results, including an English club’s chances of being the champions, finishing in the top four and bottom three, and relegation points.